GalaxDB Cloud·Coming soon.
Join waitlistThe AI-Native
Database.
SQL + vector search + local embeddings in one binary. No Pinecone. No OpenAI API. No data pipeline. Your existing psycopg2 code works unchanged.
// Declare an embedding column. GalaxDB generates vectors automatically on every INSERT.
// THE PROBLEM
Your AI stack has five bills.
GalaxDB is one.
Most AI apps bolt together a relational database, a vector store, a cache, object storage, and an embedding pipeline. That is $400 to $1,500 a month before the first user. GalaxDB replaces all of them with a single binary.
// CAPABILITIES
What you get out of the box
Unified Data Atom
One row stores structured fields, JSON, full-text, dense embeddings, raw binaries, and lineage. No more fanning data across five systems.
CREATE TABLE documents ( id INT PRIMARY KEY, body TEXT, meta JSONB, vec EMBEDDING MODEL 'mini-LM', raw BLOB );
Auto-Embedding Pipeline
EMBEDDING MODEL in DDL spawns a sidecar that handles inference, queueing, and back-pressure. You write SQL, the database does the ML plumbing.
INSERT INTO products (name, description) VALUES ('Tent', 'Lightweight 2-person'); -- ✓ embedding generated -- ✓ index updated -- ✓ no Airflow, no Lambda
Time-Travel Queries
Tag a snapshot before a training run. Come back six months later and query exactly what data the model saw. Reproduce any result, debug any regression, and demonstrate compliance without extra tooling.
-- Pin a training snapshot CREATE VERSION TAG 'train-v1' FOR TRAINING; -- Query data as it was at that point SELECT * FROM docs AT VERSION 'train-v1';
Training Export
One SQL command exports a versioned Lance dataset. Load directly into PyTorch with zero-copy memory mapping. No Airflow, no S3 pipeline.
-- Export as Lance dataset CREATE VERSION TAG 'train-v2' FOR TRAINING WITH TRAINING PRECISION 'sq8'; # Python: zero-copy PyTorch path = db.training_dataset('train-v2')
// HOW IT WORKS
Six capabilities.
Familiar SQL.
Everything ships in v1. No cloud required. No external services.
// BENCHMARKS
Real numbers. Real hardware.
Measured on AWS c6id.4xlarge (Intel Xeon Platinum 8375C, 16 vCPU, 32 GiB RAM, 884 GB NVMe), release build. Reproduction commands in BENCHMARKS.md.
// COMPARISON
GalaxDB vs the alternatives
Other databases make you choose between SQL, vector search, and local embeddings. GalaxDB ships all three. One binary, one bill.
| Feature | PG + pgvector | Pinecone | Qdrant | LanceDB | GalaxDB |
|---|---|---|---|---|---|
| Full SQL queries | |||||
| Vector search (HNSW) | |||||
| Local embeddings (no API) | |||||
| Time-travel (AT VERSION) | |||||
| Training export (Lance) | |||||
| Near-dedup (MinHash LSH) | |||||
| Embedded mode (no server) | |||||
| PostgreSQL wire protocol | |||||
| Self-hosted | |||||
| Single binary |
// OPEN SOURCE
100% Apache 2.0. Join the community.
GalaxDB v1 is fully open source. Contribute, extend, and run it anywhere. No cloud lock-in. No feature gates.
Community support
Free forever
Public beta
Want to contribute? We welcome pull requests and issues.
// FOR DEVELOPERS
Familiar SQL.
AI-native primitives.
GalaxDB extends standard SQL with four new keywords: EMBEDDING MODEL, SEMANTIC_MATCH, AT VERSION, and WHERE NOT DUPLICATE. Everything else is standard SQL your tools already understand.
Rust core
Storage engine, WAL, HNSW, and wire protocol all written in Rust. No GC pauses, no JVM overhead.
PostgreSQL wire protocol
Your existing psycopg2, SQLAlchemy, tokio-postgres, and JDBC code works unchanged.
Embedded or server
Use as a Python library with no server (like SQLite), or run as a standalone server.
Zero external deps
Single binary. No Redis, no Kafka, no Airflow. The sidecar for embeddings is optional.
One binary. All the AI primitives.
SQL + vector search + local embeddings + training export. No Pinecone. No OpenAI API. No data pipeline. Apache 2.0 open source.
Apache 2.0. Built with Rust.