Optimizing Apache Storm Topologies
This article summarizes hints for optimizing and deploying Apache Storm topologies.
Setup your storm cluster
- I/O is zookeeper’s main bottleneck - ensure that the
/data
partition of zookeeper machines serializes to quick storage (ramdisk ;) - Determine the number of parallelism units using the following rule of thumb:
- number of available CPU cores on all machines minus one core per machine that is used for the Acker
- Example: 2 machines with 48 and 1 machine with 32 cores; parallelism units = 2x(48-1) + (32-1) = 125
- Using multiple workers per machine allows deploying multiple topologies at once (the number of workers is determined by the number of ports configured in the
supervisor.slots.ports
setting instorm.yaml
)
Topology configuration suggestions
- Use one worker per machine and topology (intra-worker transports are more efficient)
- The number of executors depends on whether your bolt is I/O or CPU bound
- CPU bound: configure one executor per available parallelism unit
- I/O bound: use 10-100 workers per parallelism unit, depending on the expected I/O delay
- The total number of parallelism units in your topology should equal the number of available parallelism units
Profiling the topology
- Storm UI: use the capacity metric to identify bolts which require a higher parallelism
- your
nextTuple
andexecute
methods determine the spout’s/bolt’s runtime - optimize these methods - use queue’s for I/O in spouts or terminal bolts (i.e. write final results to a queue and use a writer thread that performs batch inserts to serialize the queue to disk)
Glossary
- workers process - responsible for executing the topology on a particular machine
- executor - thread spawned by the worker for a particular component (bold or spout); the number of executors is configured by setting the
parallelism hint
parameter in thesetSpout
orsetBolt
method. - task - number of instances of a particular bolt/spout to deploy; configuring more than one task using
setNumTasks(n)
allows to later increase the number of executors for that particular spout/bolt without redeploying the topology.
Leave a comment