GalaxDB Cloud·Coming soon.

Join waitlist

The AI-Native
Database.

SQL + vector search + local embeddings in one binary. No Pinecone. No OpenAI API. No data pipeline. Your existing psycopg2 code works unchanged.

$curl -fsSLgalaxdb.com/get| bash

// Declare an embedding column. GalaxDB generates vectors automatically on every INSERT.

1CREATE TABLE docs (
2 id INT PRIMARY KEY,
3 body TEXT
4 EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2'
5 DIM 384
6);
7 
8-- Embeddings computed automatically. No pipeline needed.
9INSERT INTO docs (id, body) VALUES
10 (1, 'machine learning and neural networks');
AuroraSQL
0.990recall@10
258Kwrite TPS
4.49 GB/sscan throughput
740tests passing

// THE PROBLEM

Your AI stack has five bills.
GalaxDB is one.

Most AI apps bolt together a relational database, a vector store, a cache, object storage, and an embedding pipeline. That is $400 to $1,500 a month before the first user. GalaxDB replaces all of them with a single binary.

Today — five tools, five bills$400-1,500 / mo
PostgreSQL
Transactional rows
Pinecone
Vector index
Redis
Hot cache + queue
S3 + DVC
Blobs + versioning
Airflow
Embedding pipeline
5 dashboards · 5 invoices · 5 on-call rotationsdata consistency is your problem
With GalaxDBone binary · ~$149 / mo
GalaxDB
v1.0 · 60 MB
OLTP rows
Vector index
Time-travel
Embeddings
Blobs
Feedback
1 dashboard · 1 invoice · 1 connection stringconsistent by design

// CAPABILITIES

What you get out of the box

v1

Unified Data Atom

One row stores structured fields, JSON, full-text, dense embeddings, raw binaries, and lineage. No more fanning data across five systems.

CREATE TABLE documents (
  id INT PRIMARY KEY,
  body TEXT,
  meta JSONB,
  vec EMBEDDING MODEL 'mini-LM',
  raw BLOB
);
v1

Auto-Embedding Pipeline

EMBEDDING MODEL in DDL spawns a sidecar that handles inference, queueing, and back-pressure. You write SQL, the database does the ML plumbing.

INSERT INTO products (name, description)
VALUES ('Tent', 'Lightweight 2-person');

-- ✓ embedding generated
-- ✓ index updated
-- ✓ no Airflow, no Lambda
v1

Time-Travel Queries

Tag a snapshot before a training run. Come back six months later and query exactly what data the model saw. Reproduce any result, debug any regression, and demonstrate compliance without extra tooling.

-- Pin a training snapshot
CREATE VERSION TAG 'train-v1' FOR TRAINING;

-- Query data as it was at that point
SELECT * FROM docs
AT VERSION 'train-v1';
v1

Training Export

One SQL command exports a versioned Lance dataset. Load directly into PyTorch with zero-copy memory mapping. No Airflow, no S3 pipeline.

-- Export as Lance dataset
CREATE VERSION TAG 'train-v2'
  FOR TRAINING
  WITH TRAINING PRECISION 'sq8';

# Python: zero-copy PyTorch
path = db.training_dataset('train-v2')

// HOW IT WORKS

Six capabilities.
Familiar SQL.

Everything ships in v1. No cloud required. No external services.

schema.sql
1-- One line replaces an entire embedding pipeline
2CREATE TABLE docs (
3 id INT PRIMARY KEY,
4 body TEXT
5 EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2'
6 DIM 384
7);
8 
9-- Embedding computed automatically
10INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks');
1 row inserted
Embedding generated: 384 dims, 14ms

// BENCHMARKS

Real numbers. Real hardware.

Measured on AWS c6id.4xlarge (Intel Xeon Platinum 8375C, 16 vCPU, 32 GiB RAM, 884 GB NVMe), release build. Reproduction commands in BENCHMARKS.md.

0.990recall@10HNSW on SIFT-1M, ef=200
258Kwrite TPS16 threads, 1M rows, NVMe
4.49 GB/sscan throughputPAX blocks + zone-map pruning
3 µsread p50warm cache, ART index

// COMPARISON

GalaxDB vs the alternatives

Other databases make you choose between SQL, vector search, and local embeddings. GalaxDB ships all three. One binary, one bill.

Full comparison with pricing →
FeaturePG + pgvectorPineconeQdrantLanceDBGalaxDB
Full SQL queries
Vector search (HNSW)
Local embeddings (no API)
Time-travel (AT VERSION)
Training export (Lance)
Near-dedup (MinHash LSH)
Embedded mode (no server)
PostgreSQL wire protocol
Self-hosted
Single binary
Yes Partial No

// OPEN SOURCE

100% Apache 2.0. Join the community.

GalaxDB v1 is fully open source. Contribute, extend, and run it anywhere. No cloud lock-in. No feature gates.

Loading...

Community support

Apache 2.0

Free forever

v1.0.0-beta.1

Public beta

Want to contribute? We welcome pull requests and issues.

// FOR DEVELOPERS

Familiar SQL.
AI-native primitives.

GalaxDB extends standard SQL with four new keywords: EMBEDDING MODEL, SEMANTIC_MATCH, AT VERSION, and WHERE NOT DUPLICATE. Everything else is standard SQL your tools already understand.

Rust core

Storage engine, WAL, HNSW, and wire protocol all written in Rust. No GC pauses, no JVM overhead.

PostgreSQL wire protocol

Your existing psycopg2, SQLAlchemy, tokio-postgres, and JDBC code works unchanged.

Embedded or server

Use as a Python library with no server (like SQLite), or run as a standalone server.

Zero external deps

Single binary. No Redis, no Kafka, no Airflow. The sidecar for embeddings is optional.

1import galaxdb
2 
3# Embedded mode -- no server needed
4db = galaxdb.Database("./mydata")
5 
6# Create table with auto-embedding
7db.execute("""
8 CREATE TABLE docs (
9 id INT PRIMARY KEY,
10 body TEXT EMBEDDING MODEL
11 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
12 )
13""")
14 
15# Insert -- embeddings computed automatically
16db.execute("INSERT INTO docs (id, body) VALUES (1, 'machine learning')")
17 
18# Semantic search
19rows = db.execute(
20 "SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'AI', 0.4)"
21)
$curl http://localhost:9091/health
{"status":"ok","version":"1.0.0-beta.1","subsystems":{
"sidecar_healthy":true,"connections_active":0}}

One binary. All the AI primitives.

SQL + vector search + local embeddings + training export. No Pinecone. No OpenAI API. No data pipeline. Apache 2.0 open source.

Apache 2.0. Built with Rust.