The manual for the development version of jq can be found here.
A jq program is a "filter": it takes an input, and produces an output. There are a lot of builtin filters for extracting a particular field of an object, or converting a number to a string, or various other standard tasks.
Filters can be combined in various ways - you can pipe the output of one filter into another filter, or collect the output of a filter into an array.
Some filters produce multiple results, for instance there's one that produces all the elements of its input array. Piping that filter into a second runs the second filter for each element of the array. Generally, things that would be done with loops and iteration in other languages are just done by gluing filters together in jq.
It's important to remember that every filter has an input and an
output. Even literals like "hello" or 42 are filters - they take an
input but always produce the same literal as output. Operations that
combine two filters, like addition, generally feed the same input to
both and combine the results. So, you can implement an averaging
filter as add / length
- feeding the input array both to the add
filter and the length
filter and dividing the results.
But that's getting ahead of ourselves. :) Let's start with something simpler:
jq filters run on a stream of JSON data. The input to jq is parsed as a sequence of whitespace-separated JSON values which are passed through the provided filter one at a time. The output(s) of the filter are written to standard out, again as a sequence of whitespace-separated JSON data.
You can affect how jq reads and writes its input and output using some command-line options:
--slurp
/-s
:Instead of running the filter for each JSON object in the input, read the entire input stream into a large array and run the filter just once.
--raw-input
/-R
:Don't parse the input as JSON. Instead, each line of text is
passed to the filter as a string. If combined with --slurp
,
then the entire input is passed to the filter as a single long
string.
--null-input
/-n
:Don't read any input at all! Instead, the filter is run once
using null
as the input. This is useful when using jq as a
simple calculator or to construct JSON data from scratch.
--compact-output
/ -c
:By default, jq pretty-prints JSON output. Using this option will result in more compact output by instead putting each JSON object on a single line.
--color-output
/ -C
and --monochrome-output
/ -M
:By default, jq outputs colored JSON if writing to a
terminal. You can force it to produce color even if writing to
a pipe or a file using -C
, and disable color with -M
.
--ascii-output
/ -a
:jq usually outputs non-ASCII Unicode codepoints as UTF-8, even if the input specified them as escape sequences (like "\u03bc"). Using this option, you can force jq to produce pure ASCII output with every non-ASCII character replaced with the equivalent escape sequence.
--raw-output
/ -r
:With this option, if the filter's result is a string then it will be written directly to standard output rather than being formatted as a JSON string with quotes. This can be useful for making jq filters talk to non-JSON-based systems.
--arg name value
:This option passes a value to the jq program as a predefined
variable. If you run jq with --arg foo bar
, then $foo
is
available in the program and has the value "bar"
.
.
The absolute simplest (and least interesting) filter
is .
. This is a filter that takes its input and
produces it unchanged as output.
Since jq by default pretty-prints all output, this trivial
program can be a useful way of formatting JSON output from,
say, curl
.
jq '.' | |
Input | "Hello, world!" |
Output | "Hello, world!" |
.foo
The simplest useful filter is .foo. When given a JSON object (aka dictionary or hash) as input, it produces the value at the key "foo", or null if there's none present.
jq '.foo' | |
Input | {"foo": 42, "bar": "less interesting data"} |
Output | 42 |
jq '.foo' | |
Input | {"notfoo": true, "alsonotfoo": false} |
Output | null |
.[foo]
, .[2]
, .[10:15]
You can also look up fields of an object using syntax like
.["foo"]
(.foo above is a shorthand version of this). This
one works for arrays as well, if the key is an
integer. Arrays are zero-based (like javascript), so .[2]
returns the third element of the array.
The .[10:15]
syntax can be used to return a subarray of an
array. The array returned by .[10:15]
will be of length 5,
containing the elements from index 10 (inclusive) to index
15 (exclusive). Either index may be negative (in which case
it counts backwards from the end of the array), or omitted
(in which case it refers to the start or end of the array).
jq '.[0]' | |
Input | [{"name":"JSON", "good":true}, {"name":"XML", "good":false}] |
Output | {"name":"JSON", "good":true} |
jq '.[2]' | |
Input | [{"name":"JSON", "good":true}, {"name":"XML", "good":false}] |
Output | null |
jq '.[2:4]' | |
Input | ["a","b","c","d","e"] |
Output | ["c", "d"] |
jq '.[:3]' | |
Input | ["a","b","c","d","e"] |
Output | ["a", "b", "c"] |
jq '.[-2:]' | |
Input | ["a","b","c","d","e"] |
Output | ["d", "e"] |
.[]
If you use the .[foo]
syntax, but omit the index
entirely, it will return all of the elements of an
array. Running .[]
with the input [1,2,3]
will produce the
numbers as three separate results, rather than as a single
array.
You can also use this on an object, and it will return all the values of the object.
jq '.[]' | |
Input | [{"name":"JSON", "good":true}, {"name":"XML", "good":false}] |
Output | {"name":"JSON", "good":true} |
{"name":"XML", "good":false} |
jq '.[]' | |
Input | [] |
Output | none |
jq '.[]' | |
Input | {"a": 1, "b": 1} |
Output | 1 |
1 |
,
If two filters are separated by a comma, then the
input will be fed into both and there will be multiple
outputs: first, all of the outputs produced by the left
expression, and then all of the outputs produced by the
right. For instance, filter .foo, .bar
, produces
both the "foo" fields and "bar" fields as separate outputs.
jq '.foo, .bar' | |
Input | {"foo": 42, "bar": "something else", "baz": true} |
Output | 42 |
"something else" |
jq '.user, .projects[]' | |
Input | {"user":"stedolan", "projects": ["jq", "wikiflow"]} |
Output | "stedolan" |
"jq" | |
"wikiflow" |
jq '.[4,2]' | |
Input | ["a","b","c","d","e"] |
Output | "e" |
"c" |
|
The | operator combines two filters by feeding the output(s) of the one on the left into the input of the one on the right. It's pretty much the same as the Unix shell's pipe, if you're used to that.
If the one on the left produces multiple results, the one on
the right will be run for each of those results. So, the
expression .[] | .foo
retrieves the "foo" field of each
element of the input array.
jq '.[] | .name' | |
Input | [{"name":"JSON", "good":true}, {"name":"XML", "good":false}] |
Output | "JSON" |
"XML" |
jq supports the same set of datatypes as JSON - numbers, strings, booleans, arrays, objects (which in JSON-speak are hashes with only string keys), and "null".
Booleans, null, strings and numbers are written the same way as
in javascript. Just like everything else in jq, these simple
values take an input and produce an output - 42
is a valid jq
expression that takes an input, ignores it, and returns 42
instead.
[]
As in JSON, []
is used to construct arrays, as in
[1,2,3]
. The elements of the arrays can be any jq
expression. All of the results produced by all of the
expressions are collected into one big array. You can use it
to construct an array out of a known quantity of values (as
in [.foo, .bar, .baz]
) or to "collect" all the results of a
filter into an array (as in [.items[].name]
)
Once you understand the "," operator, you can look at jq's array
syntax in a different light: the expression [1,2,3]
is not using a
built-in syntax for comma-separated arrays, but is instead applying
the []
operator (collect results) to the expression 1,2,3 (which
produces three different results).
If you have a filter X
that produces four results,
then the expression [X]
will produce a single result, an
array of four elements.
jq '[.user, .projects[]]' | |
Input | {"user":"stedolan", "projects": ["jq", "wikiflow"]} |
Output | ["stedolan", "jq", "wikiflow"] |
{}
Like JSON, {}
is for constructing objects (aka
dictionaries or hashes), as in: {"a": 42, "b": 17}
.
If the keys are "sensible" (all alphabetic characters), then the quotes can be left off. The value can be any expression (although you may need to wrap it in parentheses if it's a complicated one), which gets applied to the {} expression's input (remember, all filters have an input and an output).
{foo: .bar}
will produce the JSON object {"foo": 42}
if given the JSON
object {"bar":42, "baz":43}
. You can use this to select
particular fields of an object: if the input is an object
with "user", "title", "id", and "content" fields and you
just want "user" and "title", you can write
{user: .user, title: .title}
Because that's so common, there's a shortcut syntax: {user, title}
.
If one of the expressions produces multiple results, multiple dictionaries will be produced. If the input's
{"user":"stedolan","titles":["JQ Primer", "More JQ"]}
then the expression
{user, title: .titles[]}
will produce two outputs:
{"user":"stedolan", "title": "JQ Primer"}
{"user":"stedolan", "title": "More JQ"}
Putting parentheses around the key means it will be evaluated as an expression. With the same input as above,
{(.user): .titles}
produces
{"stedolan": ["JQ Primer", "More JQ"]}
jq '{user, title: .titles[]}' | |
Input | {"user":"stedolan","titles":["JQ Primer", "More JQ"]} |
Output | {"user":"stedolan", "title": "JQ Primer"} |
{"user":"stedolan", "title": "More JQ"} |
jq '{(.user): .titles}' | |
Input | {"user":"stedolan","titles":["JQ Primer", "More JQ"]} |
Output | {"stedolan": ["JQ Primer", "More JQ"]} |
Some jq operator (for instance, +
) do different things
depending on the type of their arguments (arrays, numbers,
etc.). However, jq never does implicit type conversions. If you
try to add a string to an object you'll get an error message and
no result.
+
The operator +
takes two filters, applies them both
to the same input, and adds the results together. What
"adding" means depends on the types involved:
Numbers are added by normal arithmetic.
Arrays are added by being concatenated into a larger array.
Strings are added by being joined into a larger string.
Objects are added by merging, that is, inserting all
the key-value pairs from both objects into a single
combined object. If both objects contain a value for the
same key, the object on the right of the +
wins.
null
can be added to any value, and returns the other
value unchanged.
jq '.a + 1' | |
Input | {"a": 7} |
Output | 8 |
jq '.a + .b' | |
Input | {"a": [1,2], "b": [3,4]} |
Output | [1,2,3,4] |
jq '.a + null' | |
Input | {"a": 1} |
Output | 1 |
jq '.a + 1' | |
Input | {} |
Output | 1 |
jq '{a: 1} + {b: 2} + {c: 3} + {a: 42}' | |
Input | null |
Output | {"a": 42, "b": 2, "c": 3} |
-
As well as normal arithmetic subtraction on numbers, the -
operator can be used on arrays to remove all occurrences of
the second array's elements from the first array.
jq '4 - .a' | |
Input | {"a":3} |
Output | 1 |
jq '. - ["xml", "yaml"]' | |
Input | ["xml", "yaml", "json"] |
Output | ["json"] |
*
and /
These operators only work on numbers, and do the expected.
jq '10 / . * 3' | |
Input | 5 |
Output | 6 |
length
The builtin function length
gets the length of various
different types of value:
The length of a string is the number of Unicode codepoints it contains (which will be the same as its JSON-encoded length in bytes if it's pure ASCII).
The length of an array is the number of elements.
The length of an object is the number of key-value pairs.
The length of null is zero.
jq '.[] | length' | |
Input | [[1,2], "string", {"a":2}, null] |
Output | 2 |
6 | |
1 | |
0 |
keys
The builtin function keys
, when given an object, returns
its keys in an array.
The keys are sorted "alphabetically", by unicode codepoint order. This is not an order that makes particular sense in any particular language, but you can count on it being the same for any two objects with the same set of keys, regardless of locale settings.
When keys
is given an array, it returns the valid indices
for that array: the integers from 0 to length-1.
jq 'keys' | |
Input | {"abc": 1, "abcd": 2, "Foo": 3} |
Output | ["Foo", "abc", "abcd"] |
jq 'keys' | |
Input | [42,3,35] |
Output | [0,1,2] |
has
The builtin function has
returns whether the input object
has the given key, or the input array has an element at the
given index.
has($key)
has the same effect as checking whether $key
is a member of the array returned by keys
, although has
will be faster.
jq 'map(has("foo"))' | |
Input | [{"foo": 42}, {}] |
Output | [true, false] |
jq 'map(has(2))' | |
Input | [[0,1], ["a","b","c"]] |
Output | [false, true] |
to_entries
, from_entries
, with_entries
These functions convert between an object and an array of
key-value pairs. If to_entries
is passed an object, then
for each k: v
entry in the input, the output array
includes {"key": k, "value": v}
.
from_entries
does the opposite conversion, and
with_entries(foo)
is a shorthand for to_entries |
map(foo) | from_entries
, useful for doing some operation to
all keys and values of an object.
jq 'to_entries' | |
Input | {"a": 1, "b": 2} |
Output | [{"key":"a", "value":1}, {"key":"b", "value":2}] |
jq 'from_entries' | |
Input | [{"key":"a", "value":1}, {"key":"b", "value":2}] |
Output | {"a": 1, "b": 2} |
jq 'with_entries(.key |= "KEY_" + .)' | |
Input | {"a": 1, "b": 2} |
Output | {"KEY_a": 1, "KEY_b": 2} |
select
The function select(foo)
produces its input unchanged if
foo
returns true for that input, and produces no output
otherwise.
It's useful for filtering lists: '[1,2,3] | map(select(. >= 2))
'
will give you [3]
.
jq 'map(select(. >= 2))' | |
Input | [1,5,3,0,7] |
Output | [5,3,7] |
empty
empty
returns no results. None at all. Not even null
.
It's useful on occasion. You'll know if you need it :)
jq '1, empty, 2' | |
Input | null |
Output | 1 |
2 |
jq '[1,2,empty,3]' | |
Input | null |
Output | [1,2,3] |
map(x)
For any filter x
, map(x)
will run that filter for each
element of the input array, and produce the outputs a new
array. map(.+1)
will increment each element of an array of numbers.
map(x)
is equivalent to [.[] | x]
. In fact, this is how
it's defined.
jq 'map(.+1)' | |
Input | [1,2,3] |
Output | [2,3,4] |
add
The filter add
takes as input an array, and produces as
output the elements of the array added together. This might
mean summed, concatenated or merged depending on the types
of the elements of the input array - the rules are the same
as those for the +
operator (described above).
If the input is an empty array, add
returns null
.
jq 'add' | |
Input | ["a","b","c"] |
Output | "abc" |
jq 'add' | |
Input | [1, 2, 3] |
Output | 6 |
jq 'add' | |
Input | [] |
Output | null |
range
The range
function produces a range of numbers. range(4;10)
produces 6 numbers, from 4 (inclusive) to 10 (exclusive). The numbers
are produced as separate outputs. Use [range(4;10)]
to get a range as
an array.
jq 'range(2;4)' | |
Input | null |
Output | 2 |
3 |
jq '[range(2;4)]' | |
Input | null |
Output | [2,3] |
tonumber
The tonumber
function parses its input as a number. It
will convert correctly-formatted strings to their numeric
equivalent, leave numbers alone, and give an error on all other input.
jq '.[] | tonumber' | |
Input | [1, "1"] |
Output | 1 |
1 |
tostring
The tostring
function prints its input as a
string. Strings are left unchanged, and all other values are
JSON-encoded.
jq '.[] | tostring' | |
Input | [1, "1", [1]] |
Output | "1" |
"1" | |
"[1]" |
type
The type
function returns the type of its argument as a
string, which is one of null, boolean, number, string, array
or object.
jq 'map(type)' | |
Input | [0, false, [], {}, null, "hello"] |
Output | ["number", "boolean", "array", "object", "null", "string"] |
sort, sort_by
The sort
functions sorts its input, which must be an
array. Values are sorted in the following order:
null
false
true
The ordering for objects is a little complex: first they're compared by comparing their sets of keys (as arrays in sorted order), and if their keys are equal then the values are compared key by key.
sort_by
may be used to sort by a particular field of an
object, or by applying any jq filter. sort_by(foo)
compares two elements by comparing the result of foo
on
each element.
jq 'sort' | |
Input | [8,3,null,6] |
Output | [null,3,6,8] |
jq 'sort_by(.foo)' | |
Input | [{"foo":4, "bar":10}, {"foo":3, "bar":100}, {"foo":2, "bar":1}] |
Output | [{"foo":2, "bar":1}, {"foo":3, "bar":100}, {"foo":4, "bar":10}] |
group_by
group_by(.foo)
takes as input an array, groups the
elements having the same .foo
field into separate arrays,
and produces all of these arrays as elements of a larger
array, sorted by the value of the .foo
field.
Any jq expression, not just a field access, may be used in
place of .foo
. The sorting order is the same as described
in the sort
function above.
jq 'group_by(.foo)' | |
Input | [{"foo":1, "bar":10}, {"foo":3, "bar":100}, {"foo":1, "bar":1}] |
Output | [[{"foo":1, "bar":10}, {"foo":1, "bar":1}], [{"foo":3, "bar":100}]] |
min
, max
, min_by
, max_by
Find the minimum or maximum element of the input array. The
_by
versions allow you to specify a particular field or
property to examine, e.g. min_by(.foo)
finds the object
with the smallest foo
field.
jq 'min' | |
Input | [5,4,2,7] |
Output | 2 |
jq 'max_by(.foo)' | |
Input | [{"foo":1, "bar":14}, {"foo":2, "bar":3}] |
Output | {"foo":2, "bar":3} |
unique
The unique
function takes as input an array and produces
an array of the same elements, in sorted order, with
duplicates removed.
jq 'unique' | |
Input | [1,2,5,3,5,3,1,3] |
Output | [1,2,3,5] |
reverse
This function reverses an array.
jq 'reverse' | |
Input | [1,2,3,4] |
Output | [4,3,2,1] |
contains
The filter contains(b)
will produce true if b is
completely contained within the input. A string B is
contained in a string A if B is a substring of A. An array B
is contained in an array A is all elements in B are
contained in any element in A. An object B is contained in
object A if all of the values in B are contained in the
value in A with the same key. All other types are assumed to
be contained in each other if they are equal.
jq 'contains("bar")' | |
Input | "foobar" |
Output | true |
jq 'contains(["baz", "bar"])' | |
Input | ["foobar", "foobaz", "blarp"] |
Output | true |
jq 'contains(["bazzzzz", "bar"])' | |
Input | ["foobar", "foobaz", "blarp"] |
Output | false |
jq 'contains({foo: 12, bar: [{barp: 12}]})' | |
Input | {"foo": 12, "bar":[1,2,{"barp":12, "blip":13}]} |
Output | true |
jq 'contains({foo: 12, bar: [{barp: 15}]})' | |
Input | {"foo": 12, "bar":[1,2,{"barp":12, "blip":13}]} |
Output | false |
recurse
The recurse
function allows you to search through a
recursive structure, and extract interesting data from all
levels. Suppose your input represents a filesystem:
{"name": "/", "children": [
{"name": "/bin", "children": [
{"name": "/bin/ls", "children": []},
{"name": "/bin/sh", "children": []}]},
{"name": "/home", "children": [
{"name": "/home/stephen", "children": [
{"name": "/home/stephen/jq", "children": []}]}]}]}
Now suppose you want to extract all of the filenames
present. You need to retrieve .name
, .children[].name
,
.children[].children[].name
, and so on. You can do this
with:
recurse(.children[]) | .name
jq 'recurse(.foo[])' | |
Input | {"foo":[{"foo": []}, {"foo":[{"foo":[]}]}]} |
Output | {"foo":[{"foo":[]},{"foo":[{"foo":[]}]}]} |
{"foo":[]} | |
{"foo":[{"foo":[]}]} | |
{"foo":[]} |
\(foo)
Inside a string, you can put an expression inside parens after a backslash. Whatever the expression returns will be interpolated into the string.
jq '"The input was \(.), which is one less than \(.+1)"' | |
Input | 42 |
Output | "The input was 42, which is one less than 43" |
The @foo
syntax is used to format and escape strings,
which is useful for building URLs, documents in a language
like HTML or XML, and so forth. @foo
can be used as a
filter on its own, the possible escapings are:
@text
:Calls tostring
, see that function for details.
@json
:Serialises the input as JSON.
@html
:Applies HTML/XML escaping, by mapping the characters
<>&'"
to their entity equivalents <
, >
,
&
, '
, "
.
@uri
:Applies percent-encoding, by mapping all reserved URI
characters to a %xx
sequence.
@csv
:The input must be an array, and it is rendered as CSV with double quotes for strings, and quotes escaped by repetition.
@sh
:The input is escaped suitable for use in a command-line for a POSIX shell. If the input is an array, the output will be a series of space-separated strings.
@base64
:The input is converted to base64 as specified by RFC 4648.
This syntax can be combined with string interpolation in a
useful way. You can follow a @foo
token with a string
literal. The contents of the string literal will not be
escaped. However, all interpolations made inside that string
literal will be escaped. For instance,
@uri "https://www.google.com/search?q=\(.search)"
will produce the following output for the input
{"search":"jq!"}
:
https://www.google.com/search?q=jq%21
Note that the slashes, question mark, etc. in the URL are not escaped, as they were part of the string literal.
jq '@html' | |
Input | "This works if x < y" |
Output | "This works if x < y" |
jq '@sh "echo \(.)"' | |
Input | "O'Hara's Ale" |
Output | "echo 'O'\\''Hara'\\''s Ale'" |
==
, !=
The expression 'a == b' will produce 'true' if the result of a and b are equal (that is, if they represent equivalent JSON documents) and 'false' otherwise. In particular, strings are never considered equal to numbers. If you're coming from Javascript, jq's == is like Javascript's === - considering values equal only when they have the same type as well as the same value.
!= is "not equal", and 'a != b' returns the opposite value of 'a == b'
jq '.[] == 1' | |
Input | [1, 1.0, "1", "banana"] |
Output | true |
true | |
false | |
false |
if A then B else C end
will act the same as B
if A
produces a value other than false or null, but act the same
as C
otherwise.
Checking for false or null is a simpler notion of
"truthiness" than is found in Javascript or Python, but it
means that you'll sometimes have to be more explicit about
the condition you want: you can't test whether, e.g. a
string is empty using if .name then A else B end
, you'll
need something more like if (.name | count) > 0 then A else
B end
instead.
If the condition A
produces multiple results, then B
is evaluated
once for each result that is not false or null, and C
is evaluated
once for each false or null.
More cases can be added to an if using elif A then B
syntax.
jq 'if . == 0 then "zero" elif . == 1 then "one" else "many" end' | |
Input | 2 |
Output | "many" |
>, >=, <=, <
The comparison operators >
, >=
, <=
, <
return whether
their left argument is greater than, greater than or equal
to, less than or equal to or less than their right argument
(respectively).
The ordering is the same as that described for sort
, above.
jq '. < 5' | |
Input | 2 |
Output | true |
jq supports the normal Boolean operators and/or/not. They have the same standard of truth as if expressions - false and null are considered "false values", and anything else is a "true value".
If an operand of one of these operators produces multiple results, the operator itself will produce a result for each input.
not
is in fact a builtin function rather than an operator,
so it is called as a filter to which things can be piped
rather than with special syntax, as in .foo and .bar |
not
.
These three only produce the values "true" and "false", and so are only useful for genuine Boolean operations, rather than the common Perl/Python/Ruby idiom of "value_that_may_be_null or default". If you want to use this form of "or", picking between two values rather than evaluating a condition, see the "//" operator below.
jq '42 and "a string"' | |
Input | null |
Output | true |
jq '(true, false) or false' | |
Input | null |
Output | true |
false |
jq '(true, true) and (true, false)' | |
Input | null |
Output | true |
false | |
true | |
false |
jq '[true, false | not]' | |
Input | null |
Output | [false, true] |
//
A filter of the form a // b
produces the same
results as a
, if a
produces results other than false
and null
. Otherwise, a // b
produces the same results as b
.
This is useful for providing defaults: .foo // 1
will
evaluate to 1
if there's no .foo
element in the
input. It's similar to how or
is sometimes used in Python
(jq's or
operator is reserved for strictly Boolean
operations).
jq '.foo // 42' | |
Input | {"foo": 19} |
Output | 19 |
jq '.foo // 42' | |
Input | {} |
Output | 42 |
Variables are an absolute necessity in most programming languages, but they're relegated to an "advanced feature" in jq.
In most languages, variables are the only means of passing around data. If you calculate a value, and you want to use it more than once, you'll need to store it in a variable. To pass a value to another part of the program, you'll need that part of the program to define a variable (as a function parameter, object member, or whatever) in which to place the data.
It is also possible to define functions in jq, although this is
is a feature whose biggest use is defining jq's standard library
(many jq functions such as map
and find
are in fact written
in jq).
Finally, jq has a reduce
operation, which is very powerful but a
bit tricky. Again, it's mostly used internally, to define some
useful bits of jq's standard library.
In jq, all filters have an input and an output, so manual
plumbing is not necessary to pass a value from one part of a program
to the next. Many expressions, for instance a + b
, pass their input
to two distinct subexpressions (here a
and b
are both passed the
same input), so variables aren't usually necessary in order to use a
value twice.
For instance, calculating the average value of an array of numbers
requires a few variables in most languages - at least one to hold the
array, perhaps one for each element or for a loop counter. In jq, it's
simply add / length
- the add
expression is given the array and
produces its sum, and the length
expression is given the array and
produces its length.
So, there's generally a cleaner way to solve most problems in jq that
defining variables. Still, sometimes they do make things easier, so jq
lets you define variables using expression as $variable
. All
variable names start with $
. Here's a slightly uglier version of the
array-averaging example:
length as $array_length | add / $array_length
We'll need a more complicated problem to find a situation where using variables actually makes our lives easier.
Suppose we have an array of blog posts, with "author" and "title" fields, and another object which is used to map author usernames to real names. Our input looks like:
{"posts": [{"title": "Frist psot", "author": "anon"},
{"title": "A well-written article", "author": "person1"}],
"realnames": {"anon": "Anonymous Coward",
"person1": "Person McPherson"}}
We want to produce the posts with the author field containing a real name, as in:
{"title": "Frist psot", "author": "Anonymous Coward"}
{"title": "A well-written article", "author": "Person McPherson"}
We use a variable, $names, to store the realnames object, so that we can refer to it later when looking up author usernames:
.realnames as $names | .posts[] | {title, author: $names[.author]}
The expression exp as $x | ...
means: for each value of expression
exp
, run the rest of the pipeline with the entire original input, and
with $x
set to that value. Thus as
functions as something of a
foreach loop.
Variables are scoped over the rest of the expression that defines them, so
.realnames as $names | (.posts[] | {title, author: $names[.author]})
will work, but
(.realnames as $names | .posts[]) | {title, author: $names[.author]}
won't.
jq '.bar as $x | .foo | . + $x' | |
Input | {"foo":10, "bar":200} |
Output | 210 |
You can give a filter a name using "def" syntax:
def increment: . + 1;
From then on, increment
is usable as a filter just like a
builtin function (in fact, this is how some of the builtins
are defined). A function may take arguments:
def map(f): [.[] | f];
Arguments are passed as filters, not as values. The
same argument may be referenced multiple times with
different inputs (here f
is run for each element of the
input array). Arguments to a function work more like
callbacks than like value arguments.
If you want the value-argument behaviour for defining simple functions, you can just use a variable:
def addvalue(f): f as $value | map(. + $value);
With that definition, addvalue(.foo)
will add the current
input's .foo
field to each element of the array.
jq 'def addvalue(f): . + [f]; map(addvalue(.[0]))' | |
Input | [[1,2],[10,20]] |
Output | [[1,2,1], [10,20,10]] |
jq 'def addvalue(f): f as $x | map(. + $x); addvalue(.[0])' | |
Input | [[1,2],[10,20]] |
Output | [[1,2,1,2], [10,20,1,2]] |
The reduce
syntax in jq allows you to combine all of the
results of an expression by accumulating them into a single
answer. As an example, we'll pass [3,2,1]
to this expression:
reduce .[] as $item (0; . + $item)
For each result that .[]
produces, . + $item
is run to
accumulate a running total, starting from 0. In this
example, .[]
produces the results 3, 2, and 1, so the
effect is similar to running something like this:
0 | (3 as $item | . + $item) |
(2 as $item | . + $item) |
(1 as $item | . + $item)
jq 'reduce .[] as $item (0; . + $item)' | |
Input | [10,2,5,3] |
Output | 20 |
Assignment works a little differently in jq than in most programming languages. jq doesn't distinguish between references to and copies of something - two objects or arrays are either equal or not equal, without any further notion of being "the same object" or "not the same object".
If an object has two fields which are arrays, .foo
and .bar
,
and you append something to .foo
, then .bar
will not get
bigger. Even if you've just set .bar = .foo
. If you're used to
programming in languages like Python, Java, Ruby, Javascript,
etc. then you can think of it as though jq does a full deep copy
of every object before it does the assignment (for performance,
it doesn't actually do that, but that's the general idea).
=
The filter .foo = 1
will take as input an object
and produce as output an object with the "foo" field set to
1. There is no notion of "modifying" or "changing" something
in jq - all jq values are immutable. For instance,
.foo = .bar | .foo.baz = 1
will not have the side-effect of setting .bar.baz to be set to 1, as the similar-looking program in Javascript, Python, Ruby or other languages would. Unlike these languages (but like Haskell and some other functional languages), there is no notion of two arrays or objects being "the same array" or "the same object". They can be equal, or not equal, but if we change one of them in no circumstances will the other change behind our backs.
This means that it's impossible to build circular values in jq (such as an array whose first element is itself). This is quite intentional, and ensures that anything a jq program can produce can be represented in JSON.
|=
As well as the assignment operator '=', jq provides the "update" operator '|=', which takes a filter on the right-hand side and works out the new value for the property being assigned to by running the old value through this expression. For instance, .foo |= .+1 will build an object with the "foo" field set to the input's "foo" plus 1.
This example should show the difference between '=' and '|=':
Provide input '{"a": {"b": 10}, "b": 20}' to the programs:
.a = .b .a |= .b
The former will set the "a" field of the input to the "b" field of the input, and produce the output {"a": 20}. The latter will set the "a" field of the input to the "a" field's "b" field, producing {"a": 10}.
+=
, -=
, *=
, /=
, //=
jq has a few operators of the form a op= b
, which are all
equivalent to a |= . op b
. So, += 1
can be used to increment values.
jq '.foo += 1' | |
Input | {"foo": 42} |
Output | {"foo": 43} |
Lots more things are allowed on the left-hand side of a jq assignment than in most languages. We've already seen simple field accesses on the left hand side, and it's no surprise that array accesses work just as well:
.posts[0].title = "JQ Manual"
What may come as a surprise is that the expression on the left may produce multiple results, referring to different points in the input document:
.posts[].comments |= . + ["this is great"]
That example appends the string "this is great" to the "comments" array of each post in the input (where the input is an object with a field "posts" which is an array of posts).
When jq encounters an assignment like 'a = b', it records the "path" taken to select a part of the input document while executing a. This path is then used to find which part of the input to change while executing the assignment. Any filter may be used on the left-hand side of an equals - whichever paths it selects from the input will be where the assignment is performed.
This is a very powerful operation. Suppose we wanted to add a comment to blog posts, using the same "blog" input above. This time, we only want to comment on the posts written by "stedolan". We can find those posts using the "select" function described earlier:
.posts[] | select(.author == "stedolan")
The paths provided by this operation point to each of the posts that "stedolan" wrote, and we can comment on each of them in the same way that we did before:
(.posts[] | select(.author == "stedolan") | .comments) |=
. + ["terrible."]