Configuration Reference¶
Note
It is possible to configure Dask inline with dot notation, with YAML or via environment variables. See the conversion utility for converting the following dot notation to other forms.
Dask¶
-
dask.temporary-directory
None ¶ Temporary directory for local disk storage /tmp, /scratch, or /local. This directory is used during dask spill-to-disk operations. When the value is "null" (default), dask will create a directory from where dask was launched: `cwd/dask-worker-space`
-
dask.dataframe.shuffle-compression
None ¶ Compression algorithm used for on disk-shuffling. Partd, the library used for compression supports ZLib, BZ2, SNAPPY, and BLOSC
-
dask.array.svg.size
120 ¶ The size of pixels used when displaying a dask array as an SVG image. This is used, for example, for nice rendering in a Jupyter notebook
-
dask.optimization.fuse.active
True ¶ Turn task fusion on/off
-
dask.optimization.fuse.ave-width
1 ¶ Upper limit for width, where width = num_nodes / height, a good measure of parallelizability
-
dask.optimization.fuse.max-width
None ¶ Don't fuse if total width is greater than this. Set to null to dynamically adjust to 1.5 + ave_width * log(ave_width + 1)
-
dask.optimization.fuse.max-height
inf ¶ Don't fuse more than this many levels
-
dask.optimization.fuse.max-depth-new-edges
None ¶ Don't fuse if new dependencies are added after this many levels. Set to null to dynamically adjust to ave_width * 1.5.
-
dask.optimization.fuse.subgraphs
None ¶ Set to True to fuse multiple tasks into SubgraphCallable objects. Set to None to let the default optimizer of individual dask collections decide. If no collection-specific default exists, None defaults to False.
-
dask.optimization.fuse.rename-keys
True ¶ Set to true to rename the fused keys with `default_fused_keys_renamer`. Renaming fused keys can keep the graph more understandable and comprehensible, but it comes at the cost of additional processing. If False, then the top-most key will be used. For advanced usage, a function to create the new name is also accepted.
Distributed¶
Client¶
-
distributed.client.heartbeat
5s ¶ This value is the time between heartbeats The client sends a periodic heartbeat message to the scheduler. If it misses enough of these then the scheduler assumes that it has gone.
-
distributed.client.scheduler-info-interval
2s ¶ Interval between scheduler-info updates
Comm¶
-
distributed.comm.retry.count
0 ¶ The number of times to retry a connection
-
distributed.comm.retry.delay.min
1s ¶ The first non-zero delay between retry attempts
-
distributed.comm.retry.delay.max
20s ¶ The maximum delay between retries
-
distributed.comm.compression
auto ¶ The compression algorithm to use This could be one of lz4, snappy, zstd, or blosc
-
distributed.comm.offload
10MiB ¶ The size of message after which we choose to offload serialization to another thread In some cases, you may also choose to disable this altogether with the value false This is useful if you want to include serialization in profiling data, or if you have data types that are particularly sensitive to deserialization
-
distributed.comm.default-scheme
tcp ¶ The default protocol to use, like tcp or tls
-
distributed.comm.socket-backlog
2048 ¶ When shuffling data between workers, there can really be O(cluster size) connection requests on a single worker socket, make sure the backlog is large enough not to lose any.
-
distributed.comm.recent-messages-log-length
0 ¶ number of messages to keep for debugging
-
distributed.comm.zstd.level
3 ¶ Compression level, between 1 and 22.
-
distributed.comm.zstd.threads
0 ¶ Number of threads to use. 0 for single-threaded, -1 to infer from cpu count.
-
distributed.comm.timeouts.connect
10s ¶ No Comment
-
distributed.comm.timeouts.tcp
30s ¶ No Comment
-
distributed.comm.require-encryption
None ¶ Whether to require encryption on non-local comms
-
distributed.comm.tls.ciphers
None ¶ No Comment
-
distributed.comm.tls.ca-file
None ¶ Path to a CA file, in pem format
-
distributed.comm.tls.scheduler.cert
None ¶ Path to certificate file
-
distributed.comm.tls.scheduler.key
None ¶ Path to key file. Alternatively, the key can be appended to the cert file above, and this field left blank
-
distributed.comm.tls.worker.key
None ¶ Path to key file. Alternatively, the key can be appended to the cert file above, and this field left blank
-
distributed.comm.tls.worker.cert
None ¶ Path to certificate file
-
distributed.comm.tls.client.key
None ¶ Path to key file. Alternatively, the key can be appended to the cert file above, and this field left blank
-
distributed.comm.tls.client.cert
None ¶ Path to certificate file
Dashboard¶
-
distributed.dashboard.link
{scheme}://{host}:{port}/status ¶ The form for the dashboard links This is used wherever we print out the link for the dashboard It is filled in with relevant information like the schema, host, and port number
-
distributed.dashboard.export-tool
False ¶ No Comment
-
distributed.dashboard.graph-max-items
5000 ¶ maximum number of tasks to try to plot in "graph" view
Deploy¶
-
distributed.deploy.lost-worker-timeout
15s ¶ Interval after which to hard-close a lost worker job Otherwise we wait for a while to see if a worker will reappear
-
distributed.deploy.cluster-repr-interval
500ms ¶ Interval between calls to update cluster-repr for the widget
Scheduler¶
-
distributed.scheduler.allowed-failures
3 ¶ The number of retries before a task is considered bad When a worker dies when a task is running that task is rerun elsewhere. If many workers die while running this same task then we call the task bad, and raise a KilledWorker exception. This is the number of workers that are allowed to die before this task is marked as bad.
-
distributed.scheduler.bandwidth
100000000 ¶ The expected bandwidth between any pair of workers This is used when making scheduling decisions. The scheduler will use this value as a baseline, but also learn it over time.
-
distributed.scheduler.blocked-handlers
[] ¶ A list of handlers to exclude The scheduler operates by receiving messages from various workers and clients and then performing operations based on those messages. Each message has an operation like "close-worker" or "task-finished". In some high security situations administrators may choose to block certain handlers from running. Those handlers can be listed here. For a list of handlers see the `dask.distributed.Scheduler.handlers` attribute.
-
distributed.scheduler.default-data-size
1kiB ¶ The default size of a piece of data if we don't know anything about it. This is used by the scheduler in some scheduling decisions
-
distributed.scheduler.events-cleanup-delay
1h ¶ The amount of time to wait until workers or clients are removed from the event log after they have been removed from the scheduler
-
distributed.scheduler.idle-timeout
None ¶ Shut down the scheduler after this duration if no activity has occured This can be helpful to reduce costs and stop zombie processes from roaming the earth.
-
distributed.scheduler.transition-log-length
100000 ¶ How long should we keep the transition log Every time a task transitions states (like "waiting", "processing", "memory", "released") we record that transition in a log. To make sure that we don't run out of memory we will clear out old entries after a certain length. This is that length.
-
distributed.scheduler.work-stealing
True ¶ Whether or not to balance work between workers dynamically Some times one worker has more work than we expected. The scheduler will move these tasks around as necessary by default. Set this to false to disable this behavior
-
distributed.scheduler.work-stealing-interval
100ms ¶ How frequently to balance worker loads
-
distributed.scheduler.worker-ttl
None ¶ Time to live for workers. If we don't receive a heartbeat faster than this then we assume that the worker has died.
-
distributed.scheduler.pickle
True ¶ Is the scheduler allowed to deserialize arbitrary bytestrings? The scheduler almost never deserializes user data. However there are some cases where the user can submit functions to run directly on the scheduler. This can be convenient for debugging, but also introduces some security risk. By setting this to false we ensure that the user is unable to run arbitrary code on the scheduler.
-
distributed.scheduler.preload
[] ¶ Run custom modules during the lifetime of the scheduler You can run custom modules when the scheduler starts up and closes down. See https://docs.dask.org/en/latest/setup/custom-startup.html for more information
-
distributed.scheduler.preload-argv
[] ¶ Arguments to pass into the preload scripts described above See https://docs.dask.org/en/latest/setup/custom-startup.html for more information
-
distributed.scheduler.unknown-task-duration
500ms ¶ Default duration for all tasks with unknown durations Over time the scheduler learns a duration for tasks. However when it sees a new type of task for the first time it has to make a guess as to how long it will take. This value is that guess.
-
distributed.scheduler.default-task-durations.rechunk-split
1us ¶ No Comment
-
distributed.scheduler.default-task-durations.shuffle-split
1us ¶ No Comment
-
distributed.scheduler.validate
False ¶ Whether or not to run consistency checks during execution. This is typically only used for debugging.
-
distributed.scheduler.dashboard.status.task-stream-length
1000 ¶ The maximum number of tasks to include in the task stream plot
-
distributed.scheduler.dashboard.tasks.task-stream-length
100000 ¶ The maximum number of tasks to include in the task stream plot
-
distributed.scheduler.dashboard.tls.ca-file
None ¶ No Comment
-
distributed.scheduler.dashboard.tls.key
None ¶ No Comment
-
distributed.scheduler.dashboard.tls.cert
None ¶ No Comment
-
distributed.scheduler.dashboard.bokeh-application.allow_websocket_origin
['*'] ¶ No Comment
-
distributed.scheduler.dashboard.bokeh-application.keep_alive_milliseconds
500 ¶ No Comment
-
distributed.scheduler.dashboard.bokeh-application.check_unused_sessions_milliseconds
500 ¶ No Comment
-
distributed.scheduler.locks.lease-validation-interval
10s ¶ The time to wait until an acquired semaphore is released if the Client goes out of scope
-
distributed.scheduler.locks.lease-timeout
30s ¶ The timeout after which a lease will be released if not refreshed
-
distributed.scheduler.http.routes
['distributed.http.scheduler.prometheus', 'distributed.http.scheduler.info', 'distributed.http.scheduler.json', 'distributed.http.health', 'distributed.http.proxy', 'distributed.http.statics'] ¶ A list of modules like "prometheus" and "health" that can be included or excluded as desired These modules will have a ``routes`` keyword that gets added to the main HTTP Server. This is also a list that can be extended with user defined modules.
Worker¶
-
distributed.worker.blocked-handlers
[] ¶ A list of handlers to exclude The scheduler operates by receiving messages from various workers and clients and then performing operations based on those messages. Each message has an operation like "close-worker" or "task-finished". In some high security situations administrators may choose to block certain handlers from running. Those handlers can be listed here. For a list of handlers see the `dask.distributed.Scheduler.handlers` attribute.
-
distributed.worker.multiprocessing-method
spawn ¶ How we create new workers, one of "spawn", "forkserver", or "fork" This is passed to the ``multiprocessing.get_context`` function.
-
distributed.worker.use-file-locking
True ¶ Whether or not to use lock files when creating workers Workers create a local directory in which to place temporary files. When many workers are created on the same process at once these workers can conflict with each other by trying to create this directory all at the same time. To avoid this, Dask usually used a file-based lock. However, on some systems file-based locks don't work. This is particularly common on HPC NFS systems, where users may want to set this to false.
-
distributed.worker.connections.outgoing
50 ¶ No Comment
-
distributed.worker.connections.incoming
10 ¶ No Comment
-
distributed.worker.preload
[] ¶ Run custom modules during the lifetime of the worker You can run custom modules when the worker starts up and closes down. See https://docs.dask.org/en/latest/setup/custom-startup.html for more information
-
distributed.worker.preload-argv
[] ¶ Arguments to pass into the preload scripts described above See https://docs.dask.org/en/latest/setup/custom-startup.html for more information
-
distributed.worker.daemon
True ¶ Whether or not to run our process as a daemon process
-
distributed.worker.validate
False ¶ Whether or not to run consistency checks during execution. This is typically only used for debugging.
-
distributed.worker.lifetime.duration
None ¶ The time after creation to close the worker, like "1 hour"
-
distributed.worker.lifetime.stagger
0 seconds ¶ Random amount by which to stagger lifetimes If you create many workers at the same time, you may want to avoid having them kill themselves all at the same time. To avoid this you might want to set a stagger time, so that they close themselves with some random variation, like "5 minutes" That way some workers can die, new ones can be brought up, and data can be transferred over smoothly.
-
distributed.worker.lifetime.restart
False ¶ Do we try to resurrect the worker after the lifetime deadline?
-
distributed.worker.profile.interval
10ms ¶ The time between polling the worker threads, typically short like 10ms
-
distributed.worker.profile.cycle
1000ms ¶ The time between bundling together this data and sending it to the scheduler This controls the granularity at which people can query the profile information on the time axis.
-
distributed.worker.profile.low-level
False ¶ Whether or not to use the libunwind and stacktrace libraries to gather profiling information at the lower level (beneath Python) To get this to work you will need to install the experimental stacktrace library at conda install -c numba stacktrace See https://github.com/numba/stacktrace
-
distributed.worker.memory.target
0.6 ¶ Target fraction below which to try to keep memory
-
distributed.worker.memory.spill
0.7 ¶ When the process memory (as observed by the operating system) gets above this amount we spill data to disk.
-
distributed.worker.memory.pause
0.8 ¶ When the process memory (as observed by the operating system) gets above this amount we no longer start new tasks on this worker.
-
distributed.worker.memory.terminate
0.95 ¶ When the process memory reaches this level the nanny process will kill the worker (if a nanny is present)
-
distributed.worker.http.routes
['distributed.http.worker.prometheus', 'distributed.http.health', 'distributed.http.statics'] ¶ A list of modules like "prometheus" and "health" that can be included or excluded as desired These modules will have a ``routes`` keyword that gets added to the main HTTP Server. This is also a list that can be extended with user defined modules.
Admin¶
-
distributed.admin.tick.interval
20ms ¶ The time between ticks, default 20ms
-
distributed.admin.tick.limit
3s ¶ The time allowed before triggering a warning
-
distributed.admin.max-error-length
10000 ¶ Maximum length of traceback as text Some Python tracebacks can be very very long (particularly in stack overflow errors) If the traceback is larger than this size (in bytes) then we truncate it.
-
distributed.admin.log-length
10000 ¶ Default length of logs to keep in memory The scheduler and workers keep the last 10000 or so log entries in memory.
-
distributed.admin.log-format
%(name)s - %(levelname)s - %(message)s ¶ The log format to emit. See https://docs.python.org/3/library/logging.html#logrecord-attributes
-
distributed.admin.pdb-on-err
False ¶ Enter Python Debugger on scheduling error