Built-in Functionality

General

We strive to ensure that PipelineDB maintains full compatibility with PostgreSQL 10.1+ and 11.0+. As a result, all of PostgreSQL’s built-in functionality is available to PipelineDB users.

Aggregates

As one of PipelineDB’s fundamental design goals is to facilitate high-performance continuous aggregation, PostgreSQL aggregate functions are fully supported for use in Continuous Views (with a couple of rare exceptions). In addition to this large suite of standard aggregates, PipelineDB has also added some of its own Continuous Aggregates that are purpose-built for continuous time-series data processing.

See Continuous Aggregates for more information about some of PipelineDB’s most useful features.

PipelineDB-specific Types

PipelineDB supports a number of native types for efficiently leveraging Probabilistic Data Structures & Algorithms on streams. You’ll likely never need to manually create tables with these types but often they’re the result of Continuous Aggregates, so they’ll be transparently created by Continuous Views. Here they are:

bloom

Bloom Filter

cmsketch

Count-Min Sketch

topk

Top-K

hyperloglog

HyperLogLog

tdigest

T-Digest

PipelineDB-specific Functions

PipelineDB ships with a number of functions that are useful for interacting with these types. They are described below.

Bloom Filter Functions

bloom_add ( bloom, expression )

Adds the given expression to the Bloom Filter.

bloom_cardinality ( bloom )

Returns the cardinality of the given Bloom Filter. This is the number of unique elements that were added to the Bloom filter, with a small margin or error.

bloom_contains ( bloom, expression )

Returns true if the Bloom filter probably contains the given value, with a small false positive rate.

bloom_intersection ( bloom, bloom, … )

Returns a Bloom filter representing the intersection of the given Bloom filters.

bloom_union ( bloom, bloom, … )

Returns a Bloom filter representing the union of the given Bloom filters.

See Bloom Filter Aggregates for aggregates that can be used to generate Bloom filters.

Top-K Functions

topk_increment ( topk, expression )

Increments the frequency of the given expression within the given topk and returns the resulting Top-K.

topk_increment ( topk, expression, weight )

Increments the frequency of the given expression by the specified weight within the given Top-K and returns the resulting Top-K.

topk ( topk )

Returns up to k tuples representing the given Top-K top-k values and their associated frequencies.

topk_freqs ( topk )

Returns up to k frequencies associated with the given Top-K top-k most frequent values.

topk_values ( topk )

Returns up to k values representing the given Top-K top-k most frequent values.

See Top-K Aggregates for aggregates that can be used to generate topk objects.

Frequency Functions

freq_add ( cmsketch, expression, weight )

Increments the frequency of the given expression by the specified weight within the given Count-Min Sketch.

freq ( cmsketch, expression )

Returns the number of times the value of expression was added to the given Count-Min Sketch, with a small margin of error.

freq_norm ( cmsketch, expression )

Returns the normalized frequency of expression in the given Count-Min Sketch, with a small margin of error.

freq_total ( cmsketch )

Returns the total number of items added to the given Count-Min Sketch.

See Frequency Tracking Aggregates for aggregates that can be used to generate cmsketches.

HyperLogLog Functions

hll_add ( hyperloglog, expression )

Adds the given expression to the HyperLogLog.

hll_cardinality ( hyperloglog )

Returns the cardinality of the given HyperLogLog, with roughly a ~0.2% margin of error.

hll_union ( hyperloglog, hyperloglog, … )

Returns a hyperloglog representing the union of the given hyperloglog.

See HyperLogLog Aggregates for aggregates that can be used to generate hyperloglog objects.

Distribution Functions

dist_add ( tdigest, expression, weight )

Increments the frequency of the given expression by the given weight in the T-Digest.

dist_cdf ( tdigest, expression )

Given a T-Digest, returns the value of its cumulative-distribution function evaluated at the value of expression, with a small margin of error.

dist_quantile ( tdigest, float )

Given a tdigest, returns the value at the given quantile, float. float must be in [0, 1].

See Distribution Aggregates for aggregates that can be used to generate tdigest objects.

Miscellaneous Functions

bucket_cardinality ( bucket_agg, bucket_id )

Returns the cardinality of the given bucket_id within the given bucket_agg.

bucket_ids ( bucket_agg )

Returns an array of all bucket ids contained within the given bucket_agg.

bucket_cardinalities ( bucket_agg )

Returns an array of cardinalities contained within the given bucket_agg, one for each bucket id.

See Miscellaneous Aggregates for aggregates that can be used to generate bucket_agg objects.

date_round ( timestamp, resolution )

“Floors” a date down to the nearest resolution (or bucket) expressed as an interval. This is typically useful for summarization. For example, to summarize events into 10-minute buckets:

CREATE VIEW v AS SELECT
  date_round(arrival_timestamp, '10 minutes') AS bucket_10m, COUNT(*) FROM stream
  GROUP BY bucket_10m;

year ( timestamp )

Truncate the given timestamp down to its year.

month ( timestamp )

Truncate the given timestamp down to its month.

day ( timestamp )

Truncate the given timestamp down to its day.

hour ( timestamp )

Truncate the given timestamp down to its hour.

minute ( timestamp )

Truncate the given timestamp down to its minute.

second ( timestamp )

Truncate the given timestamp down to its second.

set_cardinality ( array )

Returns the cardinality of the given set array. Sets can be built using set_agg.

Operational Functions

pipelinedb.activate ( name )

Acitvates the given continuous view or transform. See Activation and Deactivation for more information.

pipelinedb.deactivate ( name )

Deacitvates the given continuous view or transform. See Activation and Deactivation for more information.

pipelinedb.combine_table( continuous view name, table )

Combine the rows from the given table into the given continuous view. combine_table uses the given continuous view’s query definition to combine aggregate values from both relations with no loss of information.

combine_table can be used for purposes such as backfilling a continuous view (possibly running on a completely separate installation) by combining the backfilled rows into the “live” continuous view only once they have been fully populated.

pipelinedb.get_views ( )

Returns the set of all continuous views.

pipelinedb.get_transforms ( )

Returns the set of all continuous transforms.

pipelinedb.truncate_continuous_view ( name )

Truncates all rows from the given continuous view.

pipelinedb.version ( )

Returns a string containing all of the version information for your PipelineDB installation.

System Views

PipelineDB includes a number of system views for viewing useful information about your continuous views and transforms:

pipelinedb.views

Describes Continuous Views.

  View "pipelinedb.views"
Column |  Type   |
-------+---------+
id     | oid     |
schema | text    |
name   | text    |
active | boolean |
query  | text    |

pipelinedb.transforms

Describes Continuous Transforms.

  View "pipelinedb.transforms"
Column |  Type   |
-------+---------+
id     | oid     |
schema | text    |
name   | text    |
active | boolean |
tgfunc | text    |
tgargs | text[]  |
query  | text    |

pipelinedb.stream_readers

For each stream, shows all of the continuous queries that are reading from it.

   View "pipelinedb.transforms"
Column             |  Type     |
-------------------+-----------+
stream             | text      |
continuous_queries | text[]    |

More system views are available for viewing Statistics for PipelineDB processes, continuous queries, and streams.