Skip to content

Advanced Databases and Multi-source Search

This document analyzes specialized databases, search engines, and federation architectures to meet the complex needs of bibliographic search.

See also:

Researcher Use Cases

Identified Needs

┌─────────────────────────────────────────────────────────────────────────────┐
│                           RESEARCHER NEEDS                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  DISCOVERY                                                                   │
│  ─────────                                                                   │
│  "Who are the experts in ethical machine learning in Paris?"                │
│  "Which articles cite my work and work on similar topics?"                  │
│  "What are the emerging trends in my field?"                                │
│  "Who collaborates with whom in my thematic network?"                       │
│                                                                              │
│  EXPLORATION                                                                 │
│  ───────────                                                                 │
│  "Show me the evolution of CRISPR publications since 2012"                  │
│  "Which European labs work on solid-state batteries?"                       │
│  "Visualize this researcher's co-author graph"                              │
│  "Which funders support this field?"                                        │
│                                                                              │
│  MONITORING                                                                  │
│  ──────────                                                                  │
│  "Alert me when someone cites my article"                                   │
│  "New preprints in my field this week"                                      │
│  "Evolution of my competitors' h-index"                                     │
│                                                                              │
│  ANALYSIS                                                                    │
│  ────────                                                                    │
│  "Semantic similarity between two article corpora"                          │
│  "Clustering of my laboratory's research topics"                            │
│  "Impact prediction for a preprint"                                         │
│                                                                              │
│  REPORTING                                                                   │
│  ─────────                                                                   │
│  "Bibliometric export for HCERES evaluation"                                │
│  "Publication statistics by team"                                           │
│  "Mapping of international collaborations"                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Translated Query Types

User NeedData DimensionsRequired Indexes
Expert on a topic in a cityText + Geo + GraphFull-text, Spatial, Graph
Temporal evolution of a fieldTime-series + TextTemporal, Full-text
Similar articlesSemantic vectorsVector (ANN)
Collaboration networkGraphGraph traversal
Emerging trendsTime-series + NLPTemporal, Vector
Impact predictionML featuresVector, Aggregations

Multi-model Databases

ArangoDB

Overview

ArangoDB is a native multi-model database combining documents, graphs, and search in a single system.

┌─────────────────────────────────────────────────────────────────────────────┐
│                              ARANGODB                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  SUPPORTED MODELS                                                            │
│  ────────────────                                                            │
│  ✓ Documents (JSON) - Like MongoDB                                          │
│  ✓ Graphs - Native traversals, faster than Neo4j in some cases             │
│  ✓ Key-Value - Built-in cache                                               │
│  ✓ Search (ArangoSearch) - Full-text with BM25 scoring                      │
│  ✓ GeoJSON - Spatial indexes                                                │
│                                                                              │
│  QUERY LANGUAGE: AQL                                                         │
│  ───────────────────                                                         │
│  Unified for all models, similar to SQL with graph extensions               │
│                                                                              │
│  EXAMPLE - Experts on a topic in a region:                                  │
│  ──────────────────────────────────────────────────────────────────────     │
│  FOR author IN authors                                                       │
│    FILTER GEO_DISTANCE(author.location, @parisCenter) < 50000               │
│    LET expertise = (                                                         │
│      FOR v, e IN 1..1 OUTBOUND author authored                              │
│        FOR work IN works FILTER work._id == v._id                           │
│          FILTER ANALYZER(work.title, "text_en") LIKE "%machine learning%"   │
│        RETURN 1                                                              │
│    )                                                                         │
│    FILTER LENGTH(expertise) > 5                                             │
│    SORT LENGTH(expertise) DESC                                               │
│    RETURN { author, expertiseCount: LENGTH(expertise) }                     │
│                                                                              │
│  KUBERNETES                                                                  │
│  ──────────                                                                  │
│  • ArangoDB Kubernetes Operator (official)                                  │
│  • Helm chart: arangodb/kube-arangodb                                       │
│  • Modes: Single, ActiveFailover, Cluster (sharding)                        │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Model for Atlas

javascript
// Collection: works (documents)
{
  _key: "W2741809807",
  title: "Attention Is All You Need",
  abstract: "The dominant sequence transduction models...",
  year: 2017,
  doi: "10.48550/arXiv.1706.03762",
  venue: "NeurIPS",
  citationCount: 89000,
  embedding: [0.12, -0.34, ...],  // SPECTER vector 768d
  location: null,  // For works with location (conferences)
  keywords: ["transformer", "attention", "NLP"],
  _source: "openalex"
}

// Collection: authors (documents)
{
  _key: "A123456",
  displayName: "Ashish Vaswani",
  orcid: "0000-0002-1234-5678",
  affiliations: [
    {
      institution: "Google Brain",
      ror: "03yrm5c26",
      location: { type: "Point", coordinates: [37.4220, -122.0841] },
      startYear: 2015
    }
  ],
  hIndex: 45,
  topics: ["deep learning", "NLP", "transformers"]
}

// Collection: institutions (documents)
{
  _key: "I27837315",
  name: "Google",
  ror: "03yrm5c26",
  type: "company",
  location: {
    type: "Point",
    coordinates: [-122.0841, 37.4220]  // Mountain View
  },
  country: "US"
}

// Edge collection: authored (graph)
{
  _from: "authors/A123456",
  _to: "works/W2741809807",
  position: 1,
  isCorresponding: true,
  affiliationAtTime: "Google Brain"
}

// Edge collection: cites (graph)
{
  _from: "works/W2741809807",
  _to: "works/W1234567890",
  context: "building upon the work of..."
}

// Edge collection: affiliated (graph)
{
  _from: "authors/A123456",
  _to: "institutions/I27837315",
  startYear: 2015,
  endYear: null,
  role: "Research Scientist"
}

// Edge collection: collaborated (graph, computed)
{
  _from: "authors/A123456",
  _to: "authors/A789012",
  coauthorships: 15,
  firstCollab: 2015,
  lastCollab: 2024
}

Advanced AQL Queries

sql
// 1. Find experts on a topic in a geographic area
FOR author IN authors
  // Geographic filter: authors within 50km radius of Paris
  FILTER author.affiliations[*].location != null
  LET currentAffiliation = FIRST(
    FOR aff IN author.affiliations
      FILTER aff.endYear == null
      RETURN aff
  )
  FILTER currentAffiliation != null
  FILTER GEO_DISTANCE(
    currentAffiliation.location,
    GEO_POINT(2.3522, 48.8566)  // Paris
  ) < 50000

  // Count publications on the topic
  LET relevantWorks = (
    FOR v IN 1..1 OUTBOUND author authored
      SEARCH ANALYZER(v.title IN TOKENS("machine learning ethics", "text_en"), "text_en")
         OR ANALYZER(v.abstract IN TOKENS("machine learning ethics", "text_en"), "text_en")
      RETURN v
  )

  FILTER LENGTH(relevantWorks) >= 3

  // Expertise score
  LET expertiseScore = SUM(relevantWorks[*].citationCount) / LENGTH(relevantWorks)

  SORT expertiseScore DESC
  LIMIT 20

  RETURN {
    author: author.displayName,
    orcid: author.orcid,
    institution: currentAffiliation.institution,
    publicationsOnTopic: LENGTH(relevantWorks),
    avgCitations: expertiseScore,
    hIndex: author.hIndex
  }

// 2. Co-author graph with depth
FOR author IN authors
  FILTER author._key == "A123456"

  // Traverse co-author graph up to depth 2
  FOR v, e, p IN 1..2 OUTBOUND author collaborated
    OPTIONS { bfs: true, uniqueVertices: "global" }

    LET distance = LENGTH(p.edges)

    RETURN {
      coauthor: v.displayName,
      distance: distance,
      sharedWorks: e.coauthorships,
      path: p.vertices[*].displayName
    }

// 3. Temporal evolution of a field
FOR work IN works
  SEARCH ANALYZER(work.title IN TOKENS("CRISPR gene editing", "text_en"), "text_en")
     OR ANALYZER(work.abstract IN TOKENS("CRISPR gene editing", "text_en"), "text_en")

  COLLECT year = work.year INTO worksPerYear

  SORT year ASC

  RETURN {
    year: year,
    count: LENGTH(worksPerYear),
    totalCitations: SUM(worksPerYear[*].work.citationCount),
    topPapers: (
      FOR w IN worksPerYear[*].work
        SORT w.citationCount DESC
        LIMIT 3
        RETURN { title: w.title, citations: w.citationCount }
    )
  }

// 4. Similar articles (nearest neighbors on embeddings)
LET targetWork = DOCUMENT("works/W2741809807")

FOR work IN works
  FILTER work._key != targetWork._key

  // Cosine distance on embeddings
  LET similarity = (
    LET dotProduct = SUM(
      FOR i IN 0..LENGTH(targetWork.embedding)-1
        RETURN targetWork.embedding[i] * work.embedding[i]
    )
    LET normA = SQRT(SUM(FOR x IN targetWork.embedding RETURN x*x))
    LET normB = SQRT(SUM(FOR x IN work.embedding RETURN x*x))
    RETURN dotProduct / (normA * normB)
  )[0]

  FILTER similarity > 0.7

  SORT similarity DESC
  LIMIT 10

  RETURN {
    title: work.title,
    year: work.year,
    similarity: similarity,
    doi: work.doi
  }

// 5. Citation path between two articles
FOR path IN OUTBOUND K_SHORTEST_PATHS
  "works/W2741809807" TO "works/W9999999"
  cites
  OPTIONS { weightAttribute: "weight" }
  LIMIT 5

  RETURN {
    length: LENGTH(path.edges),
    papers: path.vertices[*].title
  }

ArangoDB Evaluation

AspectScoreNotes
Native multi-model⭐⭐⭐⭐⭐Unified Documents + Graph + Search
Graph⭐⭐⭐⭐Very good, but Neo4j more mature
Full-text⭐⭐⭐⭐ArangoSearch based on IResearch
Geospatial⭐⭐⭐⭐Native GeoJSON
Vector search⭐⭐⭐Possible but not optimized (no ANN)
Kubernetes⭐⭐⭐⭐Mature official operator
Scalability⭐⭐⭐⭐Sharding, but complex
Community⭐⭐⭐Smaller than Postgres/Mongo

SurrealDB

Overview

SurrealDB is a next-generation multi-model database with SQL-like queries and modern features.

┌─────────────────────────────────────────────────────────────────────────────┐
│                             SURREALDB                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  CHARACTERISTICS                                                             │
│  ───────────────                                                             │
│  • Documents, Graphs, Relations, Time-series                                │
│  • SurrealQL (SQL-like with extensions)                                     │
│  • Granular row-level permissions                                           │
│  • Real-time subscriptions (WebSocket)                                      │
│  • Embeddable or server                                                      │
│                                                                              │
│  STRENGTHS                                                                   │
│  ─────────                                                                   │
│  ✓ Record links (native relations)                                          │
│  ✓ Computed fields                                                           │
│  ✓ Events and triggers                                                       │
│  ✓ Built-in full-text search                                                │
│  ✓ Geospatial                                                                │
│                                                                              │
│  WEAKNESSES                                                                  │
│  ──────────                                                                  │
│  ✗ Very young (v1.0 end of 2023)                                            │
│  ✗ Not yet production-ready for large volumes                               │
│  ✗ No official Kubernetes operator                                          │
│  ✗ Basic vector search                                                       │
│                                                                              │
│  VERDICT: Worth watching, not recommended for immediate production          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Specialized Databases

Comparison

┌─────────────────────────────────────────────────────────────────────────────┐
│                       VECTOR DATABASES COMPARED                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                 │ Milvus  │ Qdrant  │ Weaviate │ Pinecone │ pgvector │     │
│  ───────────────┼─────────┼─────────┼──────────┼──────────┼──────────┤     │
│  Performance    │ ⭐⭐⭐⭐⭐ │ ⭐⭐⭐⭐⭐│ ⭐⭐⭐⭐   │ ⭐⭐⭐⭐⭐ │ ⭐⭐⭐     │     │
│  Scalability    │ ⭐⭐⭐⭐⭐ │ ⭐⭐⭐⭐  │ ⭐⭐⭐⭐   │ ⭐⭐⭐⭐⭐ │ ⭐⭐⭐     │     │
│  Kubernetes     │ ⭐⭐⭐⭐  │ ⭐⭐⭐⭐  │ ⭐⭐⭐⭐   │ N/A      │ ⭐⭐⭐⭐⭐  │     │
│  Open Source    │ ✅       │ ✅       │ ✅        │ ❌       │ ✅        │     │
│  Hybrid filters │ ⭐⭐⭐⭐  │ ⭐⭐⭐⭐⭐│ ⭐⭐⭐⭐   │ ⭐⭐⭐⭐  │ ⭐⭐⭐⭐   │     │
│  Multi-tenancy  │ ⭐⭐⭐⭐  │ ⭐⭐⭐⭐⭐│ ⭐⭐⭐⭐   │ ⭐⭐⭐⭐⭐ │ ⭐⭐⭐     │     │
│  PG Integration │ ❌       │ ❌       │ ❌        │ ❌       │ ✅        │     │
│  Simplicity     │ ⭐⭐⭐    │ ⭐⭐⭐⭐  │ ⭐⭐⭐     │ ⭐⭐⭐⭐⭐ │ ⭐⭐⭐⭐⭐  │     │
│  ───────────────┴─────────┴─────────┴──────────┴──────────┴──────────┘     │
│                                                                              │
│  ATLAS RECOMMENDATION                                                        │
│  ────────────────────                                                        │
│                                                                              │
│  Option A: pgvector (simplicity)                                            │
│  ───────────────────────────────                                             │
│  • PostgreSQL extension, no new database                                    │
│  • Sufficient for < 10M vectors                                             │
│  • HNSW index available since v0.5                                          │
│  • Native SQL filtering                                                      │
│                                                                              │
│  Option B: Qdrant (performance + features)                                  │
│  ─────────────────────────────────────────                                   │
│  • Best performance/simplicity ratio                                        │
│  • Excellent hybrid filtering                                               │
│  • Rust, very memory efficient                                              │
│  • Official Helm chart                                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Qdrant for Atlas

yaml
# Helm install
# helm repo add qdrant https://qdrant.github.io/qdrant-helm
# helm install qdrant qdrant/qdrant -n atlas-verify -f values.yaml

# values.yaml
replicaCount: 1

persistence:
  enabled: true
  size: 20Gi

resources:
  requests:
    memory: "512Mi"
    cpu: "250m"
  limits:
    memory: "2Gi"
    cpu: "2"

config:
  cluster:
    enabled: false  # Single node to start
  storage:
    performance:
      optimizers_count: 2
typescript
// Qdrant client
import { QdrantClient } from '@qdrant/js-client-rest';

const qdrant = new QdrantClient({ url: 'http://qdrant:6333' });

// Create collection for article embeddings
await qdrant.createCollection('work_embeddings', {
  vectors: {
    specter: { size: 768, distance: 'Cosine' },   // SPECTER embeddings
    title: { size: 384, distance: 'Cosine' },     // Title embeddings (MiniLM)
  },
  // Payload for hybrid filtering
  payload_schema: {
    year: { data_type: 'Integer', indexed: true },
    source: { data_type: 'Keyword', indexed: true },
    venue_type: { data_type: 'Keyword', indexed: true },
    citation_count: { data_type: 'Integer', indexed: true },
  },
});

// Semantic search with filters
const results = await qdrant.search('work_embeddings', {
  vector: {
    name: 'specter',
    vector: queryEmbedding,  // 768d vector
  },
  filter: {
    must: [
      { key: 'year', range: { gte: 2020 } },
      { key: 'source', match: { value: 'openalex' } },
    ],
  },
  limit: 20,
  with_payload: true,
});

pgvector (integrated alternative)

sql
-- pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Table with embeddings
ALTER TABLE canonical_works
  ADD COLUMN embedding vector(768);

-- HNSW index for efficient ANN search
CREATE INDEX ON canonical_works
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Similarity search with SQL filters
SELECT
  id, title, year,
  1 - (embedding <=> $1::vector) AS similarity
FROM canonical_works
WHERE year >= 2020
  AND type = 'article'
ORDER BY embedding <=> $1::vector
LIMIT 20;

Time-series Databases

InfluxDB vs TimescaleDB vs QuestDB

┌─────────────────────────────────────────────────────────────────────────────┐
│                      TIME-SERIES DATABASES COMPARED                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                   │ InfluxDB 3 │ TimescaleDB │ QuestDB │ ClickHouse │       │
│  ─────────────────┼────────────┼─────────────┼─────────┼────────────┤       │
│  Language         │ SQL + Flux │ SQL         │ SQL     │ SQL        │       │
│  Compression      │ ⭐⭐⭐⭐⭐   │ ⭐⭐⭐⭐      │ ⭐⭐⭐⭐⭐ │ ⭐⭐⭐⭐⭐   │       │
│  Ingestion        │ ⭐⭐⭐⭐⭐   │ ⭐⭐⭐⭐      │ ⭐⭐⭐⭐⭐ │ ⭐⭐⭐⭐⭐   │       │
│  Aggregations     │ ⭐⭐⭐⭐    │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐⭐  │ ⭐⭐⭐⭐⭐   │       │
│  PostgreSQL compat│ ❌          │ ✅          │ ✅       │ ❌         │       │
│  Kubernetes       │ ⭐⭐⭐⭐    │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐    │ ⭐⭐⭐⭐    │       │
│  Open Source      │ ⚠️ (OSS 2) │ ✅          │ ✅       │ ✅         │       │
│  ─────────────────┴────────────┴─────────────┴─────────┴────────────┘       │
│                                                                              │
│  ATLAS USE CASES                                                             │
│  ───────────────                                                             │
│                                                                              │
│  Metrics to store:                                                           │
│  • Citation evolution by article (daily)                                    │
│  • Publications by field (weekly)                                           │
│  • Author h-index (monthly)                                                 │
│  • Thematic trends (rolling window)                                         │
│  • Verification audit logs                                                   │
│                                                                              │
│  RECOMMENDATION: TimescaleDB                                                 │
│  ────────────────────────────                                                │
│  • PostgreSQL extension → single database to manage                         │
│  • Standard SQL                                                              │
│  • Continuous aggregates for rollups                                        │
│  • Automatic compression                                                     │
│  • Official Helm chart (timescale/timescaledb-single)                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

TimescaleDB Schema for Metrics

sql
-- TimescaleDB extension
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- Citation metrics by article
CREATE TABLE citation_metrics (
  time TIMESTAMPTZ NOT NULL,
  work_id UUID NOT NULL,
  citation_count INTEGER,
  reference_count INTEGER,
  influential_count INTEGER,
  source TEXT
);

SELECT create_hypertable('citation_metrics', 'time');

-- Compression after 7 days
ALTER TABLE citation_metrics SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'work_id'
);

SELECT add_compression_policy('citation_metrics', INTERVAL '7 days');

-- Continuous aggregate for weekly trends
CREATE MATERIALIZED VIEW citation_weekly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 week', time) AS week,
  work_id,
  MAX(citation_count) AS max_citations,
  AVG(citation_count) AS avg_citations
FROM citation_metrics
GROUP BY week, work_id;

-- Metrics by field
CREATE TABLE domain_metrics (
  time TIMESTAMPTZ NOT NULL,
  domain TEXT NOT NULL,
  publication_count INTEGER,
  preprint_count INTEGER,
  avg_citations NUMERIC,
  top_venues JSONB
);

SELECT create_hypertable('domain_metrics', 'time');

-- Query: field evolution
SELECT
  time_bucket('1 month', time) AS month,
  domain,
  SUM(publication_count) AS total_publications,
  AVG(avg_citations) AS avg_citations
FROM domain_metrics
WHERE domain = 'machine learning'
  AND time > NOW() - INTERVAL '5 years'
GROUP BY month, domain
ORDER BY month;

Geospatial Databases

PostGIS vs H3 vs DuckDB Spatial

┌─────────────────────────────────────────────────────────────────────────────┐
│                          GEOSPATIAL OPTIONS                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  POSTGIS (PostgreSQL Extension)                                              │
│  ──────────────────────────────                                              │
│  ✓ De facto standard                                                         │
│  ✓ All geometry types                                                       │
│  ✓ Very efficient GIST indexes                                              │
│  ✓ Complete spatial functions                                               │
│  → RECOMMENDED for Atlas (already on PostgreSQL)                            │
│                                                                              │
│  H3 (Uber Hexagonal Grid)                                                    │
│  ────────────────────────                                                    │
│  ✓ Hierarchical hexagonal indexing                                          │
│  ✓ Excellent for geo aggregations                                           │
│  ✓ PostgreSQL extension available                                           │
│  → Useful for heatmaps and geographic clustering                            │
│                                                                              │
│  ATLAS QUERIES                                                               │
│  ─────────────                                                               │
│  • Institutions within a radius                                             │
│  • Geographic collaborations                                                 │
│  • Publications heatmap by region                                           │
│  • Lab clustering by proximity                                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

PostGIS Schema for Atlas

sql
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS h3;  -- Optional for aggregations

-- Institutions with location
ALTER TABLE institutions
  ADD COLUMN location GEOMETRY(Point, 4326);

CREATE INDEX idx_institutions_geo ON institutions USING GIST (location);

-- Query: labs within 100km radius of Paris
SELECT
  i.name,
  i.country,
  ST_Distance(
    i.location::geography,
    ST_MakePoint(2.3522, 48.8566)::geography
  ) / 1000 AS distance_km,
  COUNT(DISTINCT wa.work_id) AS publications
FROM institutions i
JOIN work_authorships wa ON wa.affiliation_institution_id = i.id
WHERE ST_DWithin(
  i.location::geography,
  ST_MakePoint(2.3522, 48.8566)::geography,
  100000  -- 100km in meters
)
GROUP BY i.id
ORDER BY distance_km;

-- H3 aggregation for heatmap
SELECT
  h3_lat_lng_to_cell(
    ST_Y(i.location),
    ST_X(i.location),
    5  -- H3 resolution
  ) AS h3_cell,
  COUNT(*) AS institution_count,
  SUM(i.works_count) AS total_works
FROM institutions i
WHERE i.location IS NOT NULL
GROUP BY h3_cell;

Search Engines

Elasticsearch vs OpenSearch vs Meilisearch

┌─────────────────────────────────────────────────────────────────────────────┐
│                       SEARCH ENGINES COMPARED                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                   │ OpenSearch  │ Elasticsearch │ Meilisearch │ Typesense │ │
│  ─────────────────┼─────────────┼───────────────┼─────────────┼───────────┤ │
│  License          │ Apache 2.0  │ SSPL/Elastic  │ MIT         │ GPL-3     │ │
│  Full-text        │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐⭐⭐      │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐⭐⭐  │ │
│  Aggregations     │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐⭐⭐      │ ⭐⭐⭐        │ ⭐⭐⭐     │ │
│  Vector search    │ ⭐⭐⭐⭐     │ ⭐⭐⭐⭐⭐      │ ⭐⭐⭐        │ ⭐⭐⭐     │ │
│  Multitenancy     │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐⭐⭐      │ ⭐⭐⭐⭐      │ ⭐⭐⭐⭐   │ │
│  Kubernetes       │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐⭐⭐      │ ⭐⭐⭐⭐      │ ⭐⭐⭐     │ │
│  Resources        │ ⭐⭐         │ ⭐⭐           │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐⭐   │ │
│  Simplicity       │ ⭐⭐⭐       │ ⭐⭐⭐         │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐⭐   │ │
│  Advanced scoring │ ⭐⭐⭐⭐⭐    │ ⭐⭐⭐⭐⭐      │ ⭐⭐⭐        │ ⭐⭐⭐     │ │
│  Built-in ML      │ ⭐⭐⭐⭐     │ ⭐⭐⭐⭐⭐      │ ❌           │ ❌        │ │
│  ─────────────────┴─────────────┴───────────────┴─────────────┴───────────┘ │
│                                                                              │
│  ATLAS RECOMMENDATION                                                        │
│  ────────────────────                                                        │
│                                                                              │
│  OpenSearch for advanced search                                              │
│  ────────────────────────────────                                            │
│  • Apache 2.0 fork of Elasticsearch                                         │
│  • Equivalent features                                                       │
│  • No license risk                                                           │
│  • Excellent AWS support (if cloud)                                          │
│  • Helm chart: opensearch-project/opensearch                                │
│  • Operator: opensearch-project/opensearch-k8s-operator                     │
│                                                                              │
│  Meilisearch as complement for instant UX                                   │
│  ────────────────────────────────────────────                                │
│  • Ultra-fast autocomplete                                                  │
│  • Typo-tolerance                                                           │
│  • Very low footprint                                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

OpenSearch Architecture for Atlas

yaml
# OpenSearch cluster manifest
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: atlas-search
  namespace: atlas-verify
spec:
  general:
    serviceName: atlas-search
    version: 2.11.0
    httpPort: 9200
    pluginsList:
      - analysis-icu
      - analysis-kuromoji  # Japanese
      - analysis-smartcn   # Chinese

  dashboards:
    enable: true
    replicas: 1

  nodePools:
    - component: masters
      replicas: 3
      roles:
        - master
      resources:
        requests:
          memory: "1Gi"
          cpu: "500m"
      persistence:
        emptyDir: {}

    - component: data
      replicas: 2
      roles:
        - data
      resources:
        requests:
          memory: "2Gi"
          cpu: "1"
      persistence:
        pvc:
          storageClass: local-path
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 50Gi

OpenSearch Multi-entity Mapping

json
{
  "settings": {
    "analysis": {
      "analyzer": {
        "title_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "synonym_filter"]
        },
        "author_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "name_synonym_filter"]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "ML, machine learning",
            "DL, deep learning",
            "NLP, natural language processing",
            "AI, artificial intelligence"
          ]
        },
        "name_synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "university, univ",
            "laboratory, lab"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "_entity_type": { "type": "keyword" },

      // ══════════════════════════════════════════════════════
      // WORK FIELDS
      // ══════════════════════════════════════════════════════
      "title": {
        "type": "text",
        "analyzer": "title_analyzer",
        "fields": {
          "exact": { "type": "keyword" },
          "suggest": {
            "type": "completion",
            "contexts": [
              { "name": "source", "type": "category" }
            ]
          }
        }
      },
      "abstract": {
        "type": "text",
        "analyzer": "english"
      },
      "year": { "type": "integer" },
      "doi": { "type": "keyword" },
      "work_type": { "type": "keyword" },
      "venue": {
        "type": "text",
        "fields": { "exact": { "type": "keyword" } }
      },
      "citation_count": { "type": "integer" },
      "keywords": { "type": "keyword" },
      "source": { "type": "keyword" },

      // Embeddings for vector search
      "embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "name": "hnsw",
          "space_type": "cosinesimil",
          "engine": "nmslib"
        }
      },

      // ══════════════════════════════════════════════════════
      // AUTHOR FIELDS
      // ══════════════════════════════════════════════════════
      "display_name": {
        "type": "text",
        "analyzer": "author_analyzer",
        "fields": {
          "exact": { "type": "keyword" },
          "suggest": { "type": "completion" }
        }
      },
      "orcid": { "type": "keyword" },
      "h_index": { "type": "integer" },
      "works_count": { "type": "integer" },
      "affiliations": {
        "type": "nested",
        "properties": {
          "institution": { "type": "text" },
          "ror": { "type": "keyword" },
          "country": { "type": "keyword" }
        }
      },
      "topics": { "type": "keyword" },

      // ══════════════════════════════════════════════════════
      // INSTITUTION FIELDS
      // ══════════════════════════════════════════════════════
      "institution_name": {
        "type": "text",
        "analyzer": "author_analyzer",
        "fields": { "exact": { "type": "keyword" } }
      },
      "ror": { "type": "keyword" },
      "institution_type": { "type": "keyword" },
      "country": { "type": "keyword" },
      "location": { "type": "geo_point" },

      // ══════════════════════════════════════════════════════
      // COMMON
      // ══════════════════════════════════════════════════════
      "created_at": { "type": "date" },
      "updated_at": { "type": "date" }
    }
  }
}

Advanced OpenSearch Queries

typescript
// OpenSearch client
import { Client } from '@opensearch-project/opensearch';

const client = new Client({ node: 'http://atlas-search:9200' });

// ══════════════════════════════════════════════════════════════════════════
// 1. MULTI-ENTITY SEARCH
// ══════════════════════════════════════════════════════════════════════════

const multiEntitySearch = async (query: string) => {
  const response = await client.search({
    index: 'atlas',
    body: {
      query: {
        bool: {
          should: [
            // Search on works (title, abstract)
            {
              bool: {
                must: { multi_match: { query, fields: ['title^3', 'abstract', 'keywords^2'] } },
                filter: { term: { _entity_type: 'work' } }
              }
            },
            // Search on authors
            {
              bool: {
                must: { match: { display_name: { query, fuzziness: 'AUTO' } } },
                filter: { term: { _entity_type: 'author' } }
              }
            },
            // Search on institutions
            {
              bool: {
                must: { match: { institution_name: { query, fuzziness: 'AUTO' } } },
                filter: { term: { _entity_type: 'institution' } }
              }
            }
          ]
        }
      },
      // Aggregations by entity type
      aggs: {
        by_entity_type: {
          terms: { field: '_entity_type' },
          aggs: {
            top_hits: { top_hits: { size: 5 } }
          }
        }
      },
      highlight: {
        fields: {
          title: {},
          abstract: { fragment_size: 200 },
          display_name: {},
          institution_name: {}
        }
      }
    }
  });

  return response.body;
};

// ══════════════════════════════════════════════════════════════════════════
// 2. EXPERT ON A TOPIC IN AN AREA
// ══════════════════════════════════════════════════════════════════════════

const findExperts = async (topic: string, location: { lat: number; lon: number }, radiusKm: number) => {
  const response = await client.search({
    index: 'atlas',
    body: {
      query: {
        bool: {
          must: [
            { term: { _entity_type: 'author' } },
            { match: { topics: topic } }
          ],
          filter: [
            {
              nested: {
                path: 'affiliations',
                query: {
                  geo_distance: {
                    distance: `${radiusKm}km`,
                    'affiliations.location': location
                  }
                }
              }
            }
          ]
        }
      },
      sort: [
        { h_index: 'desc' },
        { works_count: 'desc' }
      ],
      size: 20
    }
  });

  return response.body.hits.hits;
};

// ══════════════════════════════════════════════════════════════════════════
// 3. SEMANTIC SEARCH (VECTOR + KEYWORDS)
// ══════════════════════════════════════════════════════════════════════════

const semanticSearch = async (queryEmbedding: number[], keywords: string[], filters: any) => {
  const response = await client.search({
    index: 'atlas',
    body: {
      query: {
        bool: {
          must: [
            { term: { _entity_type: 'work' } },
            // Vector search (k-NN)
            {
              knn: {
                embedding: {
                  vector: queryEmbedding,
                  k: 50
                }
              }
            }
          ],
          should: [
            // Boost if keywords match
            { terms: { keywords: keywords, boost: 2 } }
          ],
          filter: filters
        }
      },
      // Rescoring to combine BM25 and vector
      rescore: {
        window_size: 100,
        query: {
          rescore_query: {
            multi_match: {
              query: keywords.join(' '),
              fields: ['title^2', 'abstract']
            }
          },
          query_weight: 0.3,
          rescore_query_weight: 0.7
        }
      }
    }
  });

  return response.body.hits.hits;
};

// ══════════════════════════════════════════════════════════════════════════
// 4. MULTI-ENTITY AUTOCOMPLETE
// ══════════════════════════════════════════════════════════════════════════

const autocomplete = async (prefix: string) => {
  const response = await client.search({
    index: 'atlas',
    body: {
      suggest: {
        title_suggest: {
          prefix,
          completion: {
            field: 'title.suggest',
            size: 5,
            contexts: { source: ['openalex', 'crossref'] }
          }
        },
        author_suggest: {
          prefix,
          completion: {
            field: 'display_name.suggest',
            size: 5
          }
        }
      }
    }
  });

  return {
    works: response.body.suggest.title_suggest[0].options,
    authors: response.body.suggest.author_suggest[0].options
  };
};

// ══════════════════════════════════════════════════════════════════════════
// 5. FACETS AND AGGREGATIONS
// ══════════════════════════════════════════════════════════════════════════

const searchWithFacets = async (query: string) => {
  const response = await client.search({
    index: 'atlas',
    body: {
      query: {
        bool: {
          must: [
            { term: { _entity_type: 'work' } },
            { multi_match: { query, fields: ['title', 'abstract'] } }
          ]
        }
      },
      aggs: {
        // Facet by year
        by_year: {
          histogram: {
            field: 'year',
            interval: 1,
            min_doc_count: 1
          }
        },
        // Facet by type
        by_type: {
          terms: { field: 'work_type', size: 10 }
        },
        // Facet by source
        by_source: {
          terms: { field: 'source', size: 10 }
        },
        // Facet by venue (top 20)
        by_venue: {
          terms: { field: 'venue.exact', size: 20 }
        },
        // Citation stats
        citation_stats: {
          stats: { field: 'citation_count' }
        },
        // Citation distribution
        citation_ranges: {
          range: {
            field: 'citation_count',
            ranges: [
              { to: 10 },
              { from: 10, to: 50 },
              { from: 50, to: 100 },
              { from: 100, to: 500 },
              { from: 500 }
            ]
          }
        }
      },
      size: 20
    }
  });

  return {
    hits: response.body.hits,
    facets: response.body.aggregations
  };
};

Multi-database Federation Architecture

Problem Statement

How to execute a single query across multiple databases (PostgreSQL, OpenSearch, Qdrant, TimescaleDB)?

┌─────────────────────────────────────────────────────────────────────────────┐
│                           QUERY FEDERATION                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  USER QUERY                                                                  │
│  ──────────                                                                  │
│  "Experts in ethical machine learning, in Paris, with recent highly         │
│   cited publications, and articles similar to my draft"                     │
│                                                                              │
│  DECOMPOSITION                                                               │
│  ─────────────                                                               │
│  1. Full-text "machine learning ethics" → OpenSearch                        │
│  2. Geo "Paris 50km" → PostgreSQL/PostGIS                                   │
│  3. Time-series "recent publications" → TimescaleDB                         │
│  4. Vector "similar to draft" → Qdrant                                      │
│  5. Graph "co-author network" → PostgreSQL or ArangoDB                      │
│  6. Final aggregation → Application layer                                   │
│                                                                              │
│  APPROACHES                                                                  │
│  ──────────                                                                  │
│  A. Query Federation (application layer)                                    │
│  B. Data Virtualization (Trino/Presto)                                      │
│  C. Polyglot Persistence with Event Sourcing                                │
│  D. GraphQL Federation                                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Approach A: Query Federation (Application Layer)

typescript
// ═══════════════════════════════════════════════════════════════════════════
// FEDERATED QUERY SERVICE
// ═══════════════════════════════════════════════════════════════════════════

import { Effect, pipe } from 'effect';

interface FederatedQuery {
  text?: string;              // Full-text query
  embedding?: number[];       // Vector query
  geo?: {
    lat: number;
    lon: number;
    radiusKm: number;
  };
  timeRange?: {
    from: Date;
    to: Date;
  };
  filters?: Record<string, unknown>;
  entityTypes?: ('work' | 'author' | 'institution')[];
  limit?: number;
}

interface QueryPlan {
  steps: QueryStep[];
  mergeStrategy: 'union' | 'intersection' | 'weighted';
}

interface QueryStep {
  source: 'postgres' | 'opensearch' | 'qdrant' | 'timescale';
  query: unknown;
  weight: number;
  dependsOn?: string[];
}

class FederatedQueryService {
  constructor(
    private readonly postgres: PostgresClient,
    private readonly opensearch: OpenSearchClient,
    private readonly qdrant: QdrantClient,
    private readonly timescale: TimescaleClient
  ) {}

  /**
   * Executes a federated query across all relevant sources
   */
  execute = (query: FederatedQuery): Effect.Effect<FederatedResult, QueryError> => {
    return Effect.gen(this, function* () {
      // 1. Query planning
      const plan = this.planQuery(query);

      // 2. Parallel execution of independent queries
      const independentSteps = plan.steps.filter(s => !s.dependsOn?.length);
      const dependentSteps = plan.steps.filter(s => s.dependsOn?.length);

      // Parallel execution
      const independentResults = yield* Effect.all(
        independentSteps.map(step => this.executeStep(step)),
        { concurrency: 'unbounded' }
      );

      // 3. Sequential execution of dependent steps
      let allResults = new Map<string, StepResult>();
      independentResults.forEach((result, i) => {
        allResults.set(independentSteps[i].source, result);
      });

      for (const step of dependentSteps) {
        const result = yield* this.executeStep(step, allResults);
        allResults.set(step.source, result);
      }

      // 4. Results merging
      return this.mergeResults(allResults, plan.mergeStrategy, query);
    });
  };

  private planQuery(query: FederatedQuery): QueryPlan {
    const steps: QueryStep[] = [];

    // Full-text → OpenSearch
    if (query.text) {
      steps.push({
        source: 'opensearch',
        query: this.buildOpenSearchQuery(query),
        weight: 0.4
      });
    }

    // Vector similarity → Qdrant
    if (query.embedding) {
      steps.push({
        source: 'qdrant',
        query: this.buildQdrantQuery(query),
        weight: 0.3
      });
    }

    // Geo filter → PostgreSQL/PostGIS
    if (query.geo) {
      steps.push({
        source: 'postgres',
        query: this.buildPostGISQuery(query),
        weight: 0.2
      });
    }

    // Time range aggregations → TimescaleDB
    if (query.timeRange) {
      steps.push({
        source: 'timescale',
        query: this.buildTimescaleQuery(query),
        weight: 0.1
      });
    }

    return {
      steps,
      mergeStrategy: query.embedding ? 'weighted' : 'intersection'
    };
  }

  private buildOpenSearchQuery(query: FederatedQuery): OpenSearchQuery {
    return {
      bool: {
        must: [
          { multi_match: { query: query.text, fields: ['title^3', 'abstract', 'display_name^2'] } }
        ],
        filter: query.entityTypes
          ? [{ terms: { _entity_type: query.entityTypes } }]
          : []
      }
    };
  }

  private buildQdrantQuery(query: FederatedQuery): QdrantQuery {
    return {
      vector: query.embedding!,
      limit: query.limit || 100,
      filter: query.filters
    };
  }

  private buildPostGISQuery(query: FederatedQuery): string {
    return `
      SELECT id, ST_Distance(location::geography, ST_MakePoint($1, $2)::geography) as distance
      FROM institutions
      WHERE ST_DWithin(location::geography, ST_MakePoint($1, $2)::geography, $3)
      ORDER BY distance
    `;
  }

  private buildTimescaleQuery(query: FederatedQuery): string {
    return `
      SELECT work_id, MAX(citation_count) as max_citations
      FROM citation_metrics
      WHERE time BETWEEN $1 AND $2
      GROUP BY work_id
      HAVING MAX(citation_count) > 10
      ORDER BY max_citations DESC
    `;
  }

  private async executeStep(
    step: QueryStep,
    previousResults?: Map<string, StepResult>
  ): Promise<StepResult> {
    switch (step.source) {
      case 'opensearch':
        return this.opensearch.search(step.query);
      case 'qdrant':
        return this.qdrant.search(step.query);
      case 'postgres':
        return this.postgres.query(step.query);
      case 'timescale':
        return this.timescale.query(step.query);
    }
  }

  private mergeResults(
    results: Map<string, StepResult>,
    strategy: 'union' | 'intersection' | 'weighted',
    query: FederatedQuery
  ): FederatedResult {
    // Collect all IDs from each source
    const idScores = new Map<string, { score: number; sources: string[] }>();

    for (const [source, result] of results) {
      const weight = this.getSourceWeight(source, query);

      for (const hit of result.hits) {
        const existing = idScores.get(hit.id) || { score: 0, sources: [] };
        existing.score += hit.score * weight;
        existing.sources.push(source);
        idScores.set(hit.id, existing);
      }
    }

    // Apply merge strategy
    let finalIds: string[];

    switch (strategy) {
      case 'union':
        finalIds = Array.from(idScores.keys());
        break;

      case 'intersection':
        const sourceCount = results.size;
        finalIds = Array.from(idScores.entries())
          .filter(([_, v]) => v.sources.length === sourceCount)
          .map(([id]) => id);
        break;

      case 'weighted':
        finalIds = Array.from(idScores.entries())
          .sort((a, b) => b[1].score - a[1].score)
          .slice(0, query.limit || 50)
          .map(([id]) => id);
        break;
    }

    // Fetch full entities from primary store
    return {
      ids: finalIds,
      scores: Object.fromEntries(
        finalIds.map(id => [id, idScores.get(id)!.score])
      ),
      sources: Object.fromEntries(
        finalIds.map(id => [id, idScores.get(id)!.sources])
      )
    };
  }

  private getSourceWeight(source: string, query: FederatedQuery): number {
    // Dynamic weights based on query type
    if (query.embedding && source === 'qdrant') return 0.5;
    if (query.text && source === 'opensearch') return 0.4;
    if (query.geo && source === 'postgres') return 0.3;
    return 0.2;
  }
}

Approach B: Trino (Data Virtualization)

┌─────────────────────────────────────────────────────────────────────────────┐
│                           TRINO (PRESTO FORK)                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  PRINCIPLE                                                                   │
│  ─────────                                                                   │
│  Distributed SQL engine that federates multiple data sources                │
│  via connectors (catalog).                                                  │
│                                                                              │
│  ARCHITECTURE                                                                │
│  ────────────                                                                │
│                                                                              │
│  ┌─────────┐                                                                 │
│  │  Client │                                                                 │
│  │  (SQL)  │                                                                 │
│  └────┬────┘                                                                 │
│       │                                                                      │
│       ▼                                                                      │
│  ┌─────────────────────────────────────────────────────────┐                │
│  │              TRINO COORDINATOR                           │                │
│  │  • Parse SQL                                             │                │
│  │  • Plan query                                            │                │
│  │  • Distribute to workers                                 │                │
│  └────────────────────────┬────────────────────────────────┘                │
│                           │                                                  │
│       ┌───────────────────┼───────────────────┐                             │
│       ▼                   ▼                   ▼                             │
│  ┌─────────┐         ┌─────────┐         ┌─────────┐                       │
│  │ Worker  │         │ Worker  │         │ Worker  │                       │
│  └────┬────┘         └────┬────┘         └────┬────┘                       │
│       │                   │                   │                             │
│       ▼                   ▼                   ▼                             │
│  ┌─────────┐         ┌─────────┐         ┌─────────┐                       │
│  │PostgreSQL│        │OpenSearch│        │  S3     │                       │
│  │ Catalog │         │ Catalog │         │ Catalog │                       │
│  └─────────┘         └─────────┘         └─────────┘                       │
│                                                                              │
│  AVAILABLE CONNECTORS                                                        │
│  ────────────────────                                                        │
│  ✓ PostgreSQL                                                                │
│  ✓ MongoDB                                                                   │
│  ✓ Elasticsearch/OpenSearch                                                 │
│  ✓ Redis                                                                     │
│  ✓ S3 (Parquet, ORC, Avro)                                                  │
│  ✓ ClickHouse                                                                │
│  ✗ Qdrant (no native connector)                                             │
│  ✗ InfluxDB/TimescaleDB (limited)                                           │
│                                                                              │
│  KUBERNETES                                                                  │
│  ──────────                                                                  │
│  Helm chart: trino/trino                                                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Trino Configuration for Atlas

yaml
# catalogs/postgres.properties
connector.name=postgresql
connection-url=jdbc:postgresql://atlas-verify-db:5432/atlas
connection-user=${PG_USER}
connection-password=${PG_PASSWORD}

# catalogs/opensearch.properties
connector.name=opensearch
opensearch.host=atlas-search
opensearch.port=9200

# catalogs/mongodb.properties (if used for raw records)
connector.name=mongodb
mongodb.connection-url=mongodb://mongodb:27017
mongodb.schema-collection=_schema
sql
-- Federated Trino query
SELECT
  p.display_name,
  p.h_index,
  os.title,
  os.citation_count
FROM postgres.public.author_profiles p
JOIN opensearch.atlas.works os ON os.author_id = p.id
WHERE p.institution_country = 'FR'
  AND os.year >= 2020
ORDER BY os.citation_count DESC
LIMIT 50;

Approach C: GraphQL Federation

typescript
// ═══════════════════════════════════════════════════════════════════════════
// GRAPHQL SCHEMA FEDERATION
// ═══════════════════════════════════════════════════════════════════════════

// Unified GraphQL schema
const typeDefs = `#graphql
  type Query {
    # Multi-entity search
    search(
      query: String!
      types: [EntityType!]
      filters: SearchFilters
      limit: Int = 20
    ): SearchResult!

    # Expert search
    findExperts(
      topic: String!
      location: GeoInput
      minHIndex: Int
    ): [Author!]!

    # Similar articles
    similarWorks(
      workId: ID!
      limit: Int = 10
    ): [Work!]!

    # Temporal evolution
    domainTrends(
      domain: String!
      fromYear: Int!
      toYear: Int!
    ): [YearlyStats!]!
  }

  type SearchResult {
    works: [Work!]!
    authors: [Author!]!
    institutions: [Institution!]!
    totalCount: Int!
    facets: Facets!
  }

  type Work {
    id: ID!
    title: String!
    abstract: String
    year: Int
    doi: String
    citationCount: Int

    # Relations (resolved from GraphDB or SQL)
    authors: [Author!]!
    venue: Venue
    references: [Work!]!
    citations: [Work!]!

    # Enriched data
    similarity(to: ID!): Float @resolver(source: "qdrant")
    citationTrend: [CitationPoint!]! @resolver(source: "timescale")
  }

  type Author {
    id: ID!
    displayName: String!
    orcid: String
    hIndex: Int

    # Relations
    affiliations: [Affiliation!]!
    works(limit: Int = 10): [Work!]!
    coauthors(depth: Int = 1): [Author!]! @resolver(source: "graph")

    # Enriched data
    expertiseTopics: [String!]! @resolver(source: "ml")
    location: GeoPoint @resolver(source: "postgis")
  }

  type Institution {
    id: ID!
    name: String!
    ror: String
    country: String
    location: GeoPoint

    # Relations
    authors: [Author!]!
    works: [Work!]!

    # Enriched data
    publicationTrend: [YearlyStats!]! @resolver(source: "timescale")
  }

  input GeoInput {
    lat: Float!
    lon: Float!
    radiusKm: Float!
  }

  input SearchFilters {
    yearFrom: Int
    yearTo: Int
    sources: [String!]
    types: [String!]
    openAccess: Boolean
    minCitations: Int
  }

  enum EntityType {
    WORK
    AUTHOR
    INSTITUTION
  }
`;

// Resolvers with federation
const resolvers = {
  Query: {
    search: async (_, args, context) => {
      // Parallel execution on OpenSearch
      const [works, authors, institutions] = await Promise.all([
        context.dataSources.opensearch.searchWorks(args),
        context.dataSources.opensearch.searchAuthors(args),
        context.dataSources.opensearch.searchInstitutions(args)
      ]);

      return {
        works,
        authors,
        institutions,
        totalCount: works.length + authors.length + institutions.length,
        facets: await context.dataSources.opensearch.getFacets(args)
      };
    },

    findExperts: async (_, { topic, location, minHIndex }, context) => {
      // 1. Full-text search for the topic
      const candidates = await context.dataSources.opensearch.searchAuthors({
        query: topic,
        minHIndex
      });

      if (!location) return candidates;

      // 2. Geographic filter
      const authorIds = candidates.map(a => a.id);
      const geoFiltered = await context.dataSources.postgres.filterByLocation(
        authorIds,
        location
      );

      return candidates.filter(a => geoFiltered.includes(a.id));
    },

    similarWorks: async (_, { workId, limit }, context) => {
      // 1. Get work embedding
      const embedding = await context.dataSources.postgres.getWorkEmbedding(workId);

      // 2. ANN search in Qdrant
      const similarIds = await context.dataSources.qdrant.searchSimilar(
        embedding,
        limit,
        { excludeIds: [workId] }
      );

      // 3. Get details from PostgreSQL
      return context.dataSources.postgres.getWorksByIds(similarIds);
    }
  },

  Work: {
    authors: async (work, _, context) => {
      return context.dataSources.postgres.getWorkAuthors(work.id);
    },

    citationTrend: async (work, _, context) => {
      return context.dataSources.timescale.getCitationTrend(work.id);
    },

    similarity: async (work, { to }, context) => {
      const [emb1, emb2] = await Promise.all([
        context.dataSources.postgres.getWorkEmbedding(work.id),
        context.dataSources.postgres.getWorkEmbedding(to)
      ]);
      return cosineSimilarity(emb1, emb2);
    }
  },

  Author: {
    coauthors: async (author, { depth }, context) => {
      // Graph traversal
      return context.dataSources.graph.getCoauthorNetwork(author.id, depth);
    },

    location: async (author, _, context) => {
      const affiliation = await context.dataSources.postgres.getCurrentAffiliation(author.id);
      if (!affiliation?.institutionId) return null;

      return context.dataSources.postgres.getInstitutionLocation(affiliation.institutionId);
    }
  }
};

Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                      ATLAS MULTI-DATABASE ARCHITECTURE                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                              ┌─────────────────┐                            │
│                              │   API Gateway   │                            │
│                              │   (GraphQL)     │                            │
│                              └────────┬────────┘                            │
│                                       │                                      │
│                              ┌────────┴────────┐                            │
│                              │  Query Planner  │                            │
│                              │  & Federation   │                            │
│                              └────────┬────────┘                            │
│                                       │                                      │
│          ┌────────────────────────────┼────────────────────────────┐        │
│          │                            │                            │        │
│          ▼                            ▼                            ▼        │
│  ┌───────────────┐          ┌───────────────┐          ┌───────────────┐   │
│  │  PostgreSQL   │          │  OpenSearch   │          │    Qdrant     │   │
│  │  + PostGIS    │          │               │          │   (vectors)   │   │
│  │  + TimescaleDB│          │               │          │               │   │
│  │  + pgvector   │          │               │          │               │   │
│  ├───────────────┤          ├───────────────┤          ├───────────────┤   │
│  │ • Entities    │          │ • Full-text   │          │ • Semantic    │   │
│  │ • Relations   │          │ • Facets      │          │   search      │   │
│  │ • Geo         │          │ • Autocomplete│          │ • Similarity  │   │
│  │ • Time-series │          │ • Aggregations│          │ • Clustering  │   │
│  │ • Audit trail │          │               │          │               │   │
│  └───────────────┘          └───────────────┘          └───────────────┘   │
│          │                            │                            │        │
│          └────────────────────────────┼────────────────────────────┘        │
│                                       │                                      │
│                              ┌────────┴────────┐                            │
│                              │     Redis       │                            │
│                              │  (cache, queue) │                            │
│                              └─────────────────┘                            │
│                                                                              │
│  DATA FLOW                                                                   │
│  ─────────                                                                   │
│  1. Ingestion → PostgreSQL (source of truth)                                │
│  2. CDC/Sync → OpenSearch (full-text index)                                 │
│  3. CDC/Sync → Qdrant (embeddings index)                                    │
│  4. Queries → Federation layer → Multiple backends                          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Data Synchronization (CDC)

typescript
// ═══════════════════════════════════════════════════════════════════════════
// CHANGE DATA CAPTURE with Debezium
// ═══════════════════════════════════════════════════════════════════════════

// docker-compose.yml (or Kubernetes)
/*
version: '3'
services:
  debezium:
    image: debezium/connect:2.4
    environment:
      BOOTSTRAP_SERVERS: kafka:9092
      GROUP_ID: atlas-cdc
      CONFIG_STORAGE_TOPIC: connect-configs
      OFFSET_STORAGE_TOPIC: connect-offsets
      STATUS_STORAGE_TOPIC: connect-status

  # Or use Debezium Server (without Kafka)
  debezium-server:
    image: debezium/server:2.4
    environment:
      DEBEZIUM_SOURCE_CONNECTOR_CLASS: io.debezium.connector.postgresql.PostgresConnector
      DEBEZIUM_SOURCE_DATABASE_HOSTNAME: postgres
      DEBEZIUM_SOURCE_DATABASE_PORT: 5432
      DEBEZIUM_SOURCE_DATABASE_USER: debezium
      DEBEZIUM_SOURCE_DATABASE_PASSWORD: secret
      DEBEZIUM_SOURCE_DATABASE_DBNAME: atlas
      DEBEZIUM_SINK_TYPE: http
      DEBEZIUM_SINK_HTTP_URL: http://sync-service:3000/cdc
*/

// Synchronization service
import { Effect, Queue, Stream } from 'effect';

interface CDCEvent {
  op: 'c' | 'u' | 'd' | 'r';  // create, update, delete, read (snapshot)
  source: {
    table: string;
    schema: string;
  };
  before: Record<string, unknown> | null;
  after: Record<string, unknown> | null;
  ts_ms: number;
}

class SyncService {
  private readonly opensearch: OpenSearchClient;
  private readonly qdrant: QdrantClient;
  private readonly embedder: EmbeddingService;

  async handleCDCEvent(event: CDCEvent): Promise<void> {
    const { op, source, after, before } = event;

    switch (source.table) {
      case 'canonical_works':
        await this.syncWork(op, after, before);
        break;
      case 'author_profiles':
        await this.syncAuthor(op, after, before);
        break;
      case 'institutions':
        await this.syncInstitution(op, after, before);
        break;
    }
  }

  private async syncWork(
    op: string,
    after: Record<string, unknown> | null,
    before: Record<string, unknown> | null
  ): Promise<void> {
    if (op === 'd') {
      // Delete from OpenSearch and Qdrant
      await Promise.all([
        this.opensearch.delete('atlas', before!.id as string),
        this.qdrant.delete('work_embeddings', [before!.id as string])
      ]);
      return;
    }

    if (!after) return;

    // Generate embedding if text changed
    const needsEmbedding = op === 'c' ||
      before?.title !== after.title ||
      before?.abstract !== after.abstract;

    let embedding: number[] | undefined;
    if (needsEmbedding && (after.title || after.abstract)) {
      embedding = await this.embedder.embed(
        `${after.title} ${after.abstract}`
      );
    }

    // Sync to OpenSearch
    await this.opensearch.index('atlas', {
      id: after.id,
      body: {
        _entity_type: 'work',
        title: after.title,
        abstract: after.abstract,
        year: after.year,
        doi: after.doi,
        citation_count: after.citation_count,
        // ... other fields
      }
    });

    // Sync embedding to Qdrant
    if (embedding) {
      await this.qdrant.upsert('work_embeddings', {
        points: [{
          id: after.id as string,
          vector: { specter: embedding },
          payload: {
            year: after.year,
            source: after.source
          }
        }]
      });
    }
  }

  // Similar methods for authors and institutions...
}

Complete Kubernetes Manifests

yaml
# ═══════════════════════════════════════════════════════════════════════════
# NAMESPACE
# ═══════════════════════════════════════════════════════════════════════════
apiVersion: v1
kind: Namespace
metadata:
  name: atlas-data
  labels:
    app.kubernetes.io/name: atlas

---
# ═══════════════════════════════════════════════════════════════════════════
# POSTGRESQL (CloudNativePG with extensions)
# ═══════════════════════════════════════════════════════════════════════════
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: atlas-postgres
  namespace: atlas-data
spec:
  instances: 3

  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "512MB"
      effective_cache_size: "2GB"
      maintenance_work_mem: "256MB"
      checkpoint_completion_target: "0.9"
      wal_buffers: "16MB"
      default_statistics_target: "100"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      work_mem: "32MB"
      min_wal_size: "1GB"
      max_wal_size: "4GB"

    # Enabled extensions
    shared_preload_libraries:
      - timescaledb
      - pg_stat_statements

  bootstrap:
    initdb:
      postInitSQL:
        - CREATE EXTENSION IF NOT EXISTS postgis
        - CREATE EXTENSION IF NOT EXISTS postgis_topology
        - CREATE EXTENSION IF NOT EXISTS timescaledb
        - CREATE EXTENSION IF NOT EXISTS vector
        - CREATE EXTENSION IF NOT EXISTS pg_trgm
        - CREATE EXTENSION IF NOT EXISTS btree_gin

  storage:
    size: 100Gi
    storageClass: ceph-block

  resources:
    requests:
      memory: "2Gi"
      cpu: "1"
    limits:
      memory: "4Gi"
      cpu: "4"

  backup:
    barmanObjectStore:
      destinationPath: "s3://atlas-backups/postgres"
      endpointURL: "http://minio.minio:9000"
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: SECRET_ACCESS_KEY
      wal:
        compression: gzip
        maxParallel: 4
      data:
        compression: gzip
    retentionPolicy: "30d"

  monitoring:
    enablePodMonitor: true

---
# ═══════════════════════════════════════════════════════════════════════════
# OPENSEARCH (via Operator)
# ═══════════════════════════════════════════════════════════════════════════
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: atlas-opensearch
  namespace: atlas-data
spec:
  general:
    serviceName: atlas-opensearch
    version: 2.11.0
    httpPort: 9200
    pluginsList:
      - analysis-icu
      - analysis-kuromoji
      - analysis-smartcn
    drainDataNodes: true

  dashboards:
    enable: true
    replicas: 1
    resources:
      requests:
        memory: "512Mi"
        cpu: "200m"

  security:
    config:
      securityConfigSecret:
        name: opensearch-security-config
      adminCredentialsSecret:
        name: opensearch-admin-credentials

  nodePools:
    - component: masters
      replicas: 3
      roles:
        - master
      resources:
        requests:
          memory: "1Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "1"
      persistence:
        pvc:
          storageClass: ceph-block
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi

    - component: data
      replicas: 3
      roles:
        - data
        - ingest
      resources:
        requests:
          memory: "4Gi"
          cpu: "1"
        limits:
          memory: "8Gi"
          cpu: "4"
      persistence:
        pvc:
          storageClass: ceph-block
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 100Gi
      jvm: "-Xmx4g -Xms4g"

---
# ═══════════════════════════════════════════════════════════════════════════
# QDRANT
# ═══════════════════════════════════════════════════════════════════════════
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: qdrant
  namespace: atlas-data
spec:
  serviceName: qdrant
  replicas: 1  # Single node to start, scale later
  selector:
    matchLabels:
      app: qdrant
  template:
    metadata:
      labels:
        app: qdrant
    spec:
      containers:
        - name: qdrant
          image: qdrant/qdrant:v1.7.4
          ports:
            - containerPort: 6333
              name: http
            - containerPort: 6334
              name: grpc
          env:
            - name: QDRANT__SERVICE__GRPC_PORT
              value: "6334"
          resources:
            requests:
              memory: "1Gi"
              cpu: "500m"
            limits:
              memory: "4Gi"
              cpu: "2"
          volumeMounts:
            - name: storage
              mountPath: /qdrant/storage
          livenessProbe:
            httpGet:
              path: /
              port: 6333
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /readyz
              port: 6333
            initialDelaySeconds: 5
            periodSeconds: 5
  volumeClaimTemplates:
    - metadata:
        name: storage
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: ceph-block
        resources:
          requests:
            storage: 50Gi

---
apiVersion: v1
kind: Service
metadata:
  name: qdrant
  namespace: atlas-data
spec:
  selector:
    app: qdrant
  ports:
    - port: 6333
      targetPort: 6333
      name: http
    - port: 6334
      targetPort: 6334
      name: grpc

---
# ═══════════════════════════════════════════════════════════════════════════
# REDIS
# ═══════════════════════════════════════════════════════════════════════════
# Via Helm: helm install redis bitnami/redis -n atlas-data -f redis-values.yaml

# redis-values.yaml
# architecture: replication
# auth:
#   enabled: true
#   existingSecret: redis-password
# master:
#   persistence:
#     size: 8Gi
# replica:
#   replicaCount: 2
#   persistence:
#     size: 8Gi
# metrics:
#   enabled: true
#   serviceMonitor:
#     enabled: true

---
# ═══════════════════════════════════════════════════════════════════════════
# SYNC SERVICE (CDC Consumer)
# ═══════════════════════════════════════════════════════════════════════════
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sync-service
  namespace: atlas-data
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sync-service
  template:
    metadata:
      labels:
        app: sync-service
    spec:
      containers:
        - name: sync
          image: ghcr.io/univ-lehavre/atlas-sync:latest
          env:
            - name: POSTGRES_URL
              valueFrom:
                secretKeyRef:
                  name: atlas-secrets
                  key: postgres-url
            - name: OPENSEARCH_URL
              value: "http://atlas-opensearch:9200"
            - name: QDRANT_URL
              value: "http://qdrant:6333"
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: atlas-secrets
                  key: redis-url
          resources:
            requests:
              memory: "256Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"

User Interface Patterns

typescript
// ═══════════════════════════════════════════════════════════════════════════
// SVELTE COMPONENT - MULTI-ENTITY SEARCH
// ═══════════════════════════════════════════════════════════════════════════

// src/lib/components/Search.svelte
<script lang="ts">
  import { createQuery } from '@tanstack/svelte-query';

  interface SearchState {
    query: string;
    entityTypes: ('work' | 'author' | 'institution')[];
    filters: {
      yearFrom?: number;
      yearTo?: number;
      sources?: string[];
      openAccess?: boolean;
      location?: { lat: number; lon: number; radiusKm: number };
    };
    sort: string;
    page: number;
  }

  let state: SearchState = $state({
    query: '',
    entityTypes: ['work', 'author', 'institution'],
    filters: {},
    sort: 'relevance',
    page: 1
  });

  const searchQuery = createQuery({
    queryKey: ['search', state],
    queryFn: () => fetch('/api/search', {
      method: 'POST',
      body: JSON.stringify(state)
    }).then(r => r.json()),
    enabled: state.query.length > 2
  });

  // Autocomplete
  let suggestions = $state<Suggestion[]>([]);

  const fetchSuggestions = async (prefix: string) => {
    if (prefix.length < 2) {
      suggestions = [];
      return;
    }
    const res = await fetch(`/api/autocomplete?q=${encodeURIComponent(prefix)}`);
    suggestions = await res.json();
  };

  // Debounce
  let debounceTimer: number;
  const onInput = (e: Event) => {
    const value = (e.target as HTMLInputElement).value;
    state.query = value;

    clearTimeout(debounceTimer);
    debounceTimer = setTimeout(() => fetchSuggestions(value), 150);
  };
</script>

<div class="search-container">
  <!-- Search bar -->
  <div class="search-bar">
    <input
      type="text"
      value={state.query}
      oninput={onInput}
      placeholder="Search articles, authors, institutions..."
    />

    <!-- Autocomplete suggestions -->
    {#if suggestions.length > 0}
      <div class="suggestions">
        {#each suggestions as suggestion}
          <button onclick={() => { state.query = suggestion.text; suggestions = []; }}>
            <span class="type-badge">{suggestion.type}</span>
            <span class="text">{suggestion.text}</span>
          </button>
        {/each}
      </div>
    {/if}
  </div>

  <!-- Entity filters -->
  <div class="entity-filters">
    {#each ['work', 'author', 'institution'] as type}
      <label>
        <input
          type="checkbox"
          checked={state.entityTypes.includes(type)}
          onchange={(e) => {
            if (e.target.checked) {
              state.entityTypes = [...state.entityTypes, type];
            } else {
              state.entityTypes = state.entityTypes.filter(t => t !== type);
            }
          }}
        />
        {type === 'work' ? 'Publications' : type === 'author' ? 'Authors' : 'Institutions'}
      </label>
    {/each}
  </div>

  <!-- Results with facets -->
  <div class="results-layout">
    <!-- Facets (left) -->
    <aside class="facets">
      {#if $searchQuery.data?.facets}
        <FacetGroup
          title="Year"
          facet={$searchQuery.data.facets.by_year}
          onSelect={(year) => { state.filters.yearFrom = year; state.filters.yearTo = year; }}
        />
        <FacetGroup
          title="Source"
          facet={$searchQuery.data.facets.by_source}
          onSelect={(source) => { state.filters.sources = [source]; }}
        />
        <FacetGroup
          title="Type"
          facet={$searchQuery.data.facets.by_type}
          onSelect={(type) => { /* ... */ }}
        />
        <FacetGroup
          title="Citations"
          facet={$searchQuery.data.facets.citation_ranges}
          type="range"
        />
      {/if}
    </aside>

    <!-- Results (center) -->
    <main class="results">
      {#if $searchQuery.isLoading}
        <LoadingSpinner />
      {:else if $searchQuery.data}
        <!-- Tabs by entity type -->
        <Tabs>
          <Tab label="Publications ({$searchQuery.data.works.length})">
            {#each $searchQuery.data.works as work}
              <WorkCard {work} highlights={work._highlights} />
            {/each}
          </Tab>
          <Tab label="Authors ({$searchQuery.data.authors.length})">
            {#each $searchQuery.data.authors as author}
              <AuthorCard {author} />
            {/each}
          </Tab>
          <Tab label="Institutions ({$searchQuery.data.institutions.length})">
            {#each $searchQuery.data.institutions as institution}
              <InstitutionCard {institution} />
            {/each}
          </Tab>
        </Tabs>
      {/if}
    </main>
  </div>
</div>

Expert Map

typescript
// src/lib/components/ExpertMap.svelte
<script lang="ts">
  import { onMount } from 'svelte';
  import maplibregl from 'maplibre-gl';

  interface Props {
    topic: string;
    minHIndex?: number;
  }

  let { topic, minHIndex = 10 }: Props = $props();

  let map: maplibregl.Map;
  let mapContainer: HTMLDivElement;

  const expertsQuery = createQuery({
    queryKey: ['experts', topic, minHIndex],
    queryFn: () => fetch(`/api/experts?topic=${topic}&minHIndex=${minHIndex}`).then(r => r.json())
  });

  onMount(() => {
    map = new maplibregl.Map({
      container: mapContainer,
      style: 'https://basemaps.cartocdn.com/gl/positron-gl-style/style.json',
      center: [2.3522, 48.8566],
      zoom: 4
    });

    return () => map.remove();
  });

  // Update markers when data changes
  $effect(() => {
    if (!map || !$expertsQuery.data) return;

    // Clear existing markers
    document.querySelectorAll('.expert-marker').forEach(el => el.remove());

    // Add new markers
    for (const expert of $expertsQuery.data) {
      if (!expert.location) continue;

      const el = document.createElement('div');
      el.className = 'expert-marker';
      el.style.width = `${Math.min(20 + expert.hIndex, 50)}px`;
      el.style.height = `${Math.min(20 + expert.hIndex, 50)}px`;

      new maplibregl.Marker(el)
        .setLngLat([expert.location.lon, expert.location.lat])
        .setPopup(new maplibregl.Popup().setHTML(`
          <div class="expert-popup">
            <h3>${expert.displayName}</h3>
            <p>${expert.institution}</p>
            <p>h-index: ${expert.hIndex}</p>
            <p>${expert.worksOnTopic} publications on "${topic}"</p>
          </div>
        `))
        .addTo(map);
    }
  });
</script>

<div class="expert-map-container">
  <div class="controls">
    <input
      type="text"
      bind:value={topic}
      placeholder="Research topic..."
    />
    <input
      type="number"
      bind:value={minHIndex}
      min="0"
      max="100"
    />
  </div>

  <div bind:this={mapContainer} class="map"></div>

  <div class="legend">
    <p>Marker size proportional to h-index</p>
  </div>
</div>

<style>
  .map {
    width: 100%;
    height: 600px;
  }

  :global(.expert-marker) {
    background: rgba(59, 130, 246, 0.7);
    border-radius: 50%;
    cursor: pointer;
    transition: transform 0.2s;
  }

  :global(.expert-marker:hover) {
    transform: scale(1.2);
  }
</style>

Co-author Network Visualization

typescript
// src/lib/components/CoauthorNetwork.svelte
<script lang="ts">
  import { onMount } from 'svelte';
  import ForceGraph3D from '3d-force-graph';

  interface Props {
    authorId: string;
    depth?: number;
  }

  let { authorId, depth = 2 }: Props = $props();

  let container: HTMLDivElement;
  let graph: any;

  const networkQuery = createQuery({
    queryKey: ['coauthor-network', authorId, depth],
    queryFn: () => fetch(`/api/authors/${authorId}/network?depth=${depth}`).then(r => r.json())
  });

  onMount(() => {
    graph = ForceGraph3D()(container)
      .nodeLabel('name')
      .nodeColor(node => node.id === authorId ? '#ef4444' : '#3b82f6')
      .nodeVal(node => Math.sqrt(node.hIndex || 1))
      .linkWidth(link => Math.sqrt(link.coauthorships))
      .linkColor(() => 'rgba(100, 100, 100, 0.5)')
      .onNodeClick(node => {
        // Navigate to author profile
        window.location.href = `/authors/${node.id}`;
      });

    return () => graph._destructor();
  });

  $effect(() => {
    if (!graph || !$networkQuery.data) return;

    graph.graphData({
      nodes: $networkQuery.data.nodes.map(n => ({
        id: n.id,
        name: n.displayName,
        hIndex: n.hIndex,
        institution: n.institution
      })),
      links: $networkQuery.data.edges.map(e => ({
        source: e.from,
        target: e.to,
        coauthorships: e.coauthorships
      }))
    });
  });
</script>

<div class="network-container">
  <div class="controls">
    <label>
      Depth:
      <input type="range" min="1" max="3" bind:value={depth} />
      {depth}
    </label>
  </div>

  <div bind:this={container} class="graph"></div>

  <div class="stats">
    {#if $networkQuery.data}
      <p>{$networkQuery.data.nodes.length} authors</p>
      <p>{$networkQuery.data.edges.length} collaborations</p>
    {/if}
  </div>
</div>

<style>
  .graph {
    width: 100%;
    height: 700px;
    background: #1a1a2e;
  }
</style>
typescript
// src/lib/components/DomainTrends.svelte
<script lang="ts">
  import { Chart } from 'chart.js/auto';

  interface Props {
    domain: string;
    fromYear?: number;
    toYear?: number;
  }

  let { domain, fromYear = 2015, toYear = 2024 }: Props = $props();

  let canvas: HTMLCanvasElement;
  let chart: Chart;

  const trendsQuery = createQuery({
    queryKey: ['trends', domain, fromYear, toYear],
    queryFn: () => fetch(
      `/api/trends?domain=${domain}&from=${fromYear}&to=${toYear}`
    ).then(r => r.json())
  });

  $effect(() => {
    if (!canvas || !$trendsQuery.data) return;

    if (chart) chart.destroy();

    chart = new Chart(canvas, {
      type: 'line',
      data: {
        labels: $trendsQuery.data.map(d => d.year),
        datasets: [
          {
            label: 'Publications',
            data: $trendsQuery.data.map(d => d.publicationCount),
            borderColor: '#3b82f6',
            yAxisID: 'y'
          },
          {
            label: 'Average citations',
            data: $trendsQuery.data.map(d => d.avgCitations),
            borderColor: '#10b981',
            yAxisID: 'y1'
          }
        ]
      },
      options: {
        responsive: true,
        scales: {
          y: {
            type: 'linear',
            position: 'left',
            title: { display: true, text: 'Publications' }
          },
          y1: {
            type: 'linear',
            position: 'right',
            title: { display: true, text: 'Average citations' },
            grid: { drawOnChartArea: false }
          }
        }
      }
    });

    return () => chart?.destroy();
  });
</script>

<div class="trends-container">
  <div class="controls">
    <input type="text" bind:value={domain} placeholder="Domain..." />
    <input type="number" bind:value={fromYear} min="1990" max="2024" />
    <input type="number" bind:value={toYear} min="1990" max="2024" />
  </div>

  <canvas bind:this={canvas}></canvas>

  {#if $trendsQuery.data}
    <div class="highlights">
      <div class="stat">
        <span class="value">{$trendsQuery.data.reduce((a, b) => a + b.publicationCount, 0)}</span>
        <span class="label">Total publications</span>
      </div>
      <div class="stat">
        <span class="value">
          {(($trendsQuery.data[$trendsQuery.data.length - 1]?.publicationCount -
            $trendsQuery.data[0]?.publicationCount) /
            $trendsQuery.data[0]?.publicationCount * 100).toFixed(1)}%
        </span>
        <span class="label">Growth</span>
      </div>
    </div>
  {/if}
</div>

Summary and Recommendations

┌─────────────────────────────────────────────────────────────────────────────┐
│                        ATLAS ARCHITECTURE - SUMMARY                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TIER 1 - ESSENTIAL (v1)                                                    │
│  ───────────────────────                                                     │
│  • PostgreSQL 16 + PostGIS + TimescaleDB + pgvector                         │
│    → Primary database, source of truth                                      │
│  • OpenSearch                                                                │
│    → Full-text, facets, autocomplete                                        │
│  • Redis                                                                     │
│    → Cache, sessions, jobs                                                   │
│                                                                              │
│  TIER 2 - ENRICHMENT (v2)                                                   │
│  ────────────────────────                                                    │
│  • Qdrant                                                                    │
│    → Semantic search, similarity (if pgvector insufficient)                 │
│  • Meilisearch                                                               │
│    → Ultra-fast autocomplete (if OpenSearch too slow)                       │
│                                                                              │
│  TIER 3 - ADVANCED (v3)                                                     │
│  ─────────────────────                                                       │
│  • ArangoDB or Neo4j                                                         │
│    → Co-author graph, citations (if complex traversals needed)              │
│  • ClickHouse                                                                │
│    → Heavy analytics, reporting (if TimescaleDB insufficient)               │
│                                                                              │
│  KEY DECISIONS                                                               │
│  ─────────────                                                               │
│                                                                              │
│  PostgreSQL as pivot:                                                        │
│  • Single database to operate initially                                     │
│  • Extensions cover 80% of needs                                            │
│  • pgvector for vectors (sufficient < 10M)                                  │
│  • PostGIS for geo                                                           │
│  • TimescaleDB for time-series                                              │
│                                                                              │
│  OpenSearch for search:                                                      │
│  • Advanced full-text, multilingual                                         │
│  • Facets and aggregations                                                   │
│  • Built-in vector search (k-NN)                                            │
│  • Open source (Apache 2.0)                                                  │
│                                                                              │
│  Federation at application level:                                           │
│  • GraphQL as unified interface                                             │
│  • Intelligent query planner                                                │
│  • CDC for synchronization                                                   │
│  • No Trino (too complex for v1)                                            │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Resource Estimation

Componentv1 (basic)v2 (production)v3 (scale)
PostgreSQL3x2Gi, 100Gi3x4Gi, 200Gi5x8Gi, 500Gi
OpenSearch2x4Gi, 50Gi3x8Gi, 200Gi6x16Gi, 1Ti
Qdrant-1x2Gi, 50Gi3x4Gi, 200Gi
Redis1x512Mi3x1Gi3x2Gi
API/Workers4x512Mi8x1Gi16x2Gi
Total~20Gi RAM~50Gi RAM~150Gi RAM

Implementation Roadmap

PhaseDurationComponentsFeatures
v1.02 monthsPG+OS+RedisBasic search, verification
v1.51 month+ pgvectorSemantic similarity
v2.02 months+ Qdrant, MeiliAdvanced search, autocomplete
v2.51 month+ TimescaleDBTrends, metrics
v3.03 months+ ArangoDBCo-author graph, traversals