.. _quickstart:

Quickstart
=======================

First, complete the :ref:`installation` process.

Wikipedia Traffic
-----------------

In this example we'll compute some basic statistics on a day's worth of `Wikipedia page view data`_. Each record in the dataset contains hourly page view statistics for every Wikipedia page. The record format is as follows:

.. _Wikipedia page view data: http://dumps.wikimedia.org/other/pagecounts-raw/

.. code-block:: bash

	hour | project | page title | view count | bytes served

First, let's create our continuous view using :code:`psql`:

.. code-block:: bash

	psql -c "
	CREATE FOREIGN TABLE wiki_stream (
		hour timestamp,
		project text,
		title text,
		view_count bigint,
		size bigint)
	SERVER pipelinedb;
	CREATE VIEW wiki_stats WITH (action=materialize) AS
	SELECT hour, project,
		count(*) AS total_pages,
		sum(view_count) AS total_views,
		min(view_count) AS min_views,
		max(view_count) AS max_views,
		avg(view_count) AS avg_views,
		percentile_cont(0.99) WITHIN GROUP (ORDER BY view_count) AS p99_views,
		sum(size) AS total_bytes_served
	FROM wiki_stream
	GROUP BY hour, project;"

Now we'll decompress the dataset as a stream and write it to :code:`stdin`, which can be used as an input to :code:`COPY`:

.. code-block:: bash

		curl -sL http://pipelinedb.com/data/wiki-pagecounts | gunzip | \
			psql -c "
			COPY wiki_stream (hour, project, title, view_count, size) FROM STDIN"

Note that this dataset is large, so the above command will run for quite a while (cancel it whenever you'd like). As it's running, select from the continuous view as it ingests data from the input stream:

.. code-block:: bash

	psql -c "
	SELECT * FROM wiki_stats ORDER BY total_views DESC";