Logo Citus Docs
9.1

Get Started

  • What is Citus?
    • How Far Can Citus Scale?
  • When to Use Citus
    • Multi-Tenant Database
    • Real-Time Analytics
    • Considerations for Use
    • When Citus is Inappropriate
  • Quick Tutorials
    • Multi-tenant Applications
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries
    • Real-time Analytics
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries

Install

  • Single-Machine Cluster
    • Docker (Mac or Linux)
    • Ubuntu or Debian
    • Fedora, CentOS, or Red Hat
  • Multi-Machine Cluster
    • Ubuntu or Debian
      • Steps to be executed on all nodes
      • Steps to be executed on the coordinator node
    • Fedora, CentOS, or Red Hat
      • Steps to be executed on all nodes
      • Steps to be executed on the coordinator node
  • Managed Deployment

Use-Case Guides

  • Multi-tenant Applications
    • Let’s Make an App – Ad Analytics
    • Scaling the Relational Data Model
    • Preparing Tables and Ingesting Data
      • Try it Yourself
    • Integrating Applications
    • Sharing Data Between Tenants
    • Online Changes to the Schema
    • When Data Differs Across Tenants
    • Scaling Hardware Resources
    • Dealing with Big Tenants
    • Where to Go From Here
  • Real-Time Dashboards
    • Data Model
    • Rollups
    • Expiring Old Data
    • Approximate Distinct Counts
    • Unstructured Data with JSONB
  • Timeseries Data
    • Scaling Timeseries Data on Citus
    • Automating Partition Creation

Architecture

  • Concepts
    • Nodes
      • Coordinator and Workers
    • Distributed Data
      • Table Types
        • Type 1: Distributed Tables
        • Type 2: Reference Tables
        • Type 3: Local Tables
      • Shards
        • Shard Placements
      • Co-Location
      • Parallelism
  • Citus MX
    • Data Access
    • Scaling Out a Raw Events Table
    • MX Limitations
      • Supported only via coordinator
      • Other query limitations

Develop

  • Determining Application Type
    • At a Glance
    • Examples and Characteristics
  • Choosing Distribution Column
    • Multi-Tenant Apps
      • Best Practices
    • Real-Time Apps
      • Best Practices
    • Timeseries Data
      • Best Practices
    • Table Co-Location
      • Data co-location in Citus for hash-distributed tables
      • A practical example of co-location
      • Using Regular PostgreSQL Tables
      • Distributing tables by ID
      • Distributing tables by tenant
      • Co-location means better feature support
      • Query Performance
  • Migrating an Existing App
    • Identify Distribution Strategy
      • Pick distribution key
      • Identify types of tables
    • Prepare Source Tables for Migration
      • Add distribution keys
      • Backfill newly created columns
    • Prepare Application for Citus
      • Set up Development Citus Cluster
        • Include distribution column in keys
      • Add distribution key to queries
        • Ruby on Rails
        • Django
        • ASP.NET
        • Java Hibernate
        • Other (SQL Principles)
      • Enable Secure Connections
      • Check for cross-node traffic
    • Migrate Production Data
      • Small Database Migration
      • Big Database Migration
        • Duplicate schema
        • Enable logical replication
        • Open access for network connection
        • Begin Replication
        • Switch over to Citus and stop all connections to old database
  • SQL Reference
    • Creating and Modifying Distributed Tables (DDL)
      • Creating And Distributing Tables
        • Reference Tables
        • Distributing Coordinator Data
      • Co-Locating Tables
        • Upgrading from Citus 5.x
      • Dropping Tables
      • Modifying Tables
        • Adding/Modifying Columns
        • Adding/Removing Constraints
        • Using NOT VALID Constraints
        • Adding/Removing Indices
        • Manual Modification
    • Ingesting, Modifying Data (DML)
      • Inserting Data
        • “From Select” Clause (Distributed Rollups)
        • COPY Command (Bulk load)
    • Caching Aggregations with Rollups
      • Updates and Deletion
      • Maximizing Write Performance
    • Querying Distributed Tables (SQL)
      • Aggregate Functions
        • Count (Distinct) Aggregates
        • Estimating Top N Items
      • Limit Pushdown
      • Views on Distributed Tables
      • Joins
        • Co-located joins
        • Reference table joins
        • Repartition joins
    • Query Processing
      • Distributed Query Planner
      • Distributed Query Executor
        • Adaptive Executor
        • Task Tracker Executor
        • Subquery/CTE Push-Pull Execution
      • PostgreSQL planner and executor
    • Manual Query Propagation
      • Running on all Workers
      • Running on all Shards
      • Running on all Placements
      • Limitations
    • SQL Support and Workarounds
      • Workarounds
        • JOIN a local and a distributed table
        • Temp Tables: the Workaround of Last Resort
  • Citus API
    • Citus Utility Functions
      • Table and Shard DDL
        • create_distributed_table
        • create_reference_table
        • upgrade_to_reference_table
        • mark_tables_colocated
        • create_distributed_function
        • master_create_distributed_table
        • master_create_worker_shards
        • master_create_empty_shard
      • Table and Shard DML
        • master_append_table_to_shard
        • master_apply_delete_command
        • master_modify_multiple_shards
      • Metadata / Configuration Information
        • master_add_node
        • master_update_node
        • master_set_node_property
        • master_add_inactive_node
        • master_activate_node
        • master_disable_node
        • master_add_secondary_node
        • master_remove_node
        • master_get_active_worker_nodes
        • master_get_table_metadata
        • get_shard_id_for_distribution_column
        • column_to_column_name
        • citus_relation_size
        • citus_table_size
        • citus_total_relation_size
        • citus_stat_statements_reset
      • Cluster Management And Repair Functions
        • master_copy_shard_placement
        • master_move_shard_placement
        • rebalance_table_shards
        • get_rebalance_progress
        • master_drain_node
        • replicate_table_shards
        • isolate_tenant_to_new_shard
        • citus_create_restore_point
    • Citus Tables and Views
      • Coordinator Metadata
        • Partition table
        • Shard table
        • Shard placement table
        • Worker node table
        • Distributed object table
        • Co-location group table
        • Query statistics table
        • Distributed Query Activity
      • Tables on all Nodes
        • Connection Credentials Table
        • Connection Pooling Credentials
        • Shards and Indices on MX Workers
    • Configuration Reference
      • General configuration
        • citus.max_worker_nodes_tracked (integer)
        • citus.use_secondary_nodes (enum)
        • citus.cluster_name (text)
        • citus.enable_version_checks (boolean)
        • citus.log_distributed_deadlock_detection (boolean)
        • citus.distributed_deadlock_detection_factor (floating point)
        • citus.node_conninfo (text)
        • citus.override_table_visibility (boolean)
      • Query Statistics
        • citus.stat_statements_purge_interval (integer)
        • citus.stat_statements_max (integer)
      • Data Loading
        • citus.multi_shard_commit_protocol (enum)
        • citus.shard_replication_factor (integer)
        • citus.shard_count (integer)
        • citus.shard_max_size (integer)
      • Planner Configuration
        • citus.limit_clause_row_fetch_count (integer)
        • citus.count_distinct_error_rate (floating point)
        • citus.task_assignment_policy (enum)
      • Intermediate Data Transfer
        • citus.binary_worker_copy_format (boolean)
        • citus.binary_master_copy_format (boolean)
        • citus.max_intermediate_result_size (integer)
      • DDL
        • citus.enable_ddl_propagation (boolean)
      • Executor Configuration
        • General
        • Adaptive executor configuration
        • Task tracker executor configuration
        • Real-time executor configuration (deprecated)
        • Explain output
    • Append Distribution
      • Creating and Distributing Tables
      • Expiring Data
      • Deleting Data
      • Dropping Tables
      • Data Loading
        • Bulk load using \copy
        • Incremental loads by appending to existing shards
        • Increasing data loading performance
      • Scaling Data Ingestion
        • Coordinator Node Bulk Ingestion (100k/s-200k/s)
        • Worker Node Bulk Ingestion (100k/s-1M/s)
        • Pre-processing Data in Citus
  • External Integrations
    • Ingesting Data from Kafka
      • Caveats
    • Ingesting Data from Spark
    • Business Intelligence with Tableau

Administer

  • Cluster Management
    • Choosing Cluster Size
      • Shard Count
        • Multi-Tenant SaaS Use-Case
        • Real-Time Analytics Use-Case
    • Initial Hardware Size
      • Multi-Tenant SaaS Use-Case
      • Real-Time Analytics Use-Case
    • Scaling the cluster
      • Add a worker
      • Rebalance Shards without Downtime
        • How it Works
      • Adding a coordinator
    • Dealing With Node Failures
      • Worker Node Failures
      • Coordinator Node Failures
    • Tenant Isolation
    • Viewing Query Statistics
      • Statistics Expiration
    • Resource Conservation
      • Limiting Long-Running Queries
    • Security
      • Connection Management
      • Increasing Worker Security
      • Row-Level Security
    • PostgreSQL extensions
    • Creating a New Database
  • Table Management
    • Determining Table and Relation Size
    • Vacuuming Distributed Tables
    • Analyzing Distributed Tables
  • Upgrading Citus
    • Upgrading Citus Versions
      • Patch Version Upgrade
      • Major and Minor Version Upgrades
        • Step 1. Update Citus Package
        • Step 2. Apply Update in DB
    • Upgrading PostgreSQL version from 11 to 12
      • For Every Node

Troubleshoot

  • Query Performance Tuning
    • Table Distribution and Shards
    • PostgreSQL tuning
    • Scaling Out Performance
    • Distributed Query Performance Tuning
      • General
      • Subquery/CTE Network Overhead
      • Advanced
        • Task Assignment Policy
        • Intermediate Data Transfer Format
        • Real Time Executor
        • Task Tracker Executor
    • Scaling Out Data Ingestion
      • Real-time Insert and Updates
        • Insert Throughput
        • Update Throughput
        • Insert and Update: Throughput Checklist
        • Insert and Update: Latency
      • Staging Data Temporarily
      • Bulk Copy (250K - 2M/s)
      • Citus MX (50k/s-500k/s)
  • Useful Diagnostic Queries
    • Finding which shard contains data for a specific tenant
    • Finding the distribution column for a table
    • Detecting locks
    • Querying the size of your shards
    • Querying the size of all distributed tables
    • Determining Replication Factor per Table
    • Identifying unused indices
    • Monitoring client connection count
    • Index hit rate
  • Common Error Messages
    • Relation foo is not distributed
      • Resolution
    • Could not receive query results
      • Resolution
    • Canceling the transaction since it was involved in a distributed deadlock
      • Resolution
    • Could not connect to server: Cannot assign requested address
      • Resolution
    • SSL error: certificate verify failed
      • Resolution
    • Could not connect to any active placements
      • Resolution
    • Remaining connection slots are reserved for non-replication superuser connections
      • Resolution
    • PgBouncer cannot connect to server
      • Resolution
    • Unsupported clause type
      • Resolution
    • Cannot open new connections after the first modification command within a transaction
      • Resolution
    • Cannot create uniqueness constraint
      • Resolution
    • Function create_distributed_table does not exist
      • Resolution
    • STABLE functions used in UPDATE queries cannot be called with column references
      • Resolution

FAQ

  • Frequently Asked Questions
    • Can I create primary keys on distributed tables?
    • How do I add nodes to an existing Citus cluster?
    • How does Citus handle failure of a worker node?
    • How does Citus handle failover of the coordinator node?
    • How do I ingest the results of a query into a distributed table?
    • Can I join distributed and non-distributed tables together in the same query?
    • Are there any PostgreSQL features not supported by Citus?
    • How do I choose the shard count when I hash-partition my data?
    • How do I change the shard count for a hash partitioned table?
    • How does citus support count(distinct) queries?
    • In which situations are uniqueness constraints supported on distributed tables?
    • How do I create database roles, functions, extensions etc in a Citus cluster?
    • What if a worker node’s address changes?
    • Which shard contains data for a particular tenant?
    • I forgot the distribution column of a table, how do I find it?
    • Can I distribute a table by multiple keys?
    • Why does pg_relation_size report zero bytes for a distributed table?
    • Why am I seeing an error about max_intermediate_result_size?
    • Can I run Citus on Microsoft Azure?
    • Can I run Citus on Amazon RDS?
    • What is the state of Citus on AWS?
    • Can I create a new DB in a Citus Cloud instance?
    • Can I shard by schema on Citus for multi-tenant applications?
    • How does cstore_fdw work with Citus?
    • What happened to pg_shard?

Citus Cloud

  • Overview
    • Provisioning
      • Configuring Your Plan
      • Supported Regions
    • Connecting
      • Connecting Directly to a Worker
  • Manage
    • Scaling
      • Scaling Up (increasing node size)
      • Scaling Out (adding new nodes)
        • Rebalancing
      • Scaling Connections (pgBouncer)
    • Monitoring
      • Cloud Platform Status
      • Resources Usage
        • Amazon EBS Volume Metrics
        • CPU and Network
        • PostgreSQL Write-Ahead Log
      • Formation Events Feed
      • StatsD external reporting
        • Example: Datadog with statsd
      • VividCortex External Monitoring
    • Security
      • Connecting with SSL
        • Two-Factor Authentication
      • Users and Permissions
        • Granting Privileges in Bulk
        • Granting Access to Other Schemas
      • Encryption at Rest
      • Network Perimeter Controls
        • VPC Peering
        • IP Whitelisting
    • Backup, Availability, and Replication
      • High-Availability (HA) Replication
      • Disaster Recovery (DR)
      • Comparison of HA and DR
        • Disaster Recovery takes a little extra work but gives greater reliability
        • Trade-offs between latency and reliability for backups
        • How High Availability and Disaster Recovery is used for crash recovery
      • Point-in-Time Recovery
    • Formation API
      • Authentication
      • Create Formation
      • List Formations
      • Formation Details
      • Destroy Formation
    • Upgrades
    • Logging
      • What Is Logged
      • Recent Logs
      • External Log Destinations
        • Verified Provider Settings
  • Additional Features
    • Extensions
    • Forking
      • How to Fork a Formation
      • When is it Useful
      • How it Works Internally
    • Followers
    • Custom PostgreSQL Configuration
  • Support and Billing
    • Support
    • Billing
      • Pricing
      • Payments

Articles

  • Related Articles
    • Efficient Rollup Tables with HyperLogLog in Postgres
      • Rollup tables without HLL—using GitHub events data as an example
      • Without HLL, rollup tables have a few limitations
      • HLL to the rescue
      • HLL and rollup tables in action, together
      • What kinds of queries can HLL answer?
      • A rollup table with HLL is worth a thousand rollup tables without HLL
      • Want to learn more about HLL in Postgres?
    • Distributed Distinct Count with HyperLogLog on Postgres
      • What does HLL do behind the curtains?
        • Hash all elements
        • Observe the data for rare patterns
        • Stochastic Averaging
        • More?
      • HLL in distributed systems
      • Hands on with HLL
        • Setup
        • Examples
        • Conclusion
    • Postgres Parallel Indexing in Citus
    • Real-time Event Aggregation at Scale Using Postgres with Citus
    • How Distributed Outer Joins on PostgreSQL with Citus Work
      • Distributed Outer Joins with Citus
    • Designing your SaaS Database for Scale with Postgres
    • Building a Scalable Postgres Metrics Backend using the Citus Extension
      • Time-Series Metrics
      • Events
    • Sharding a Multi-Tenant App with Postgres
      • Tenancy
      • Multi-tenancy and co-location, a perfect pair
      • In conclusion
    • Sharding Postgres with Semi-Structured Data and Its Performance Implications
      • One large table, without joins
      • Enter Citus
      • The query workload
      • Every distribution has its thorns
    • Scalable Real-time Product Search using PostgreSQL with Citus
Citus Docs
  • Docs »
  • Citus Documentation

Citus Documentation¶

Welcome to the documentation for Citus 9.1! Citus horizontally scales PostgreSQL across commodity servers using sharding and replication. Its query engine parallelizes incoming SQL queries across these servers to enable real-time responses on large datasets.

Getting Started

Learn the Citus architecture, install locally, and follow some ten-minute tutorials.

Use Cases

See how Citus allows multi-tenant applications to scale with minimal database changes.

Migrating to Citus

Move from plain PostgreSQL to Citus, and discover data modeling techniques for distributed systems.

Managed Service

Explore our secure, scalable, highly available database-as-a-service.

API / Reference

Get the most out of Citus by learning its functions and configuration.

Help and Support

See the frequently asked questions, and contact us. This is the page to get unstuck.
_images/number-one.png _images/use-cases.png _images/migrating.png _images/cloud.png _images/reference.png _images/help.png _images/logo.png _images/cloud-bill-credit.png _images/cloud-bill-ach.png

This site uses cookies for analytics, personalized content and ads. By continuing to browse this site, you agree to this use. Learn more.

Next

© Copyright 2020, Citus Data, a Microsoft Company