Skip to content

Crawler options 0.7.13+

📝 Name: crawlerOptions · 🖥️ Option: --crawler-options

Additional options for configurable crawlers.

INFO

These options only apply to configurable crawlers. If the configured crawler does not implement the required interface, a warning is shown.

Example

Pass crawler options in the expected input format.

IMPORTANT

When passing crawler options as command parameter or environment variable, make sure to pass them as JSON-encoded string.

bash
./cache-warmup.phar --crawler-options '{"concurrency": 3, "request_options": {"delay": 3000}}'
json
{
    "crawlerOptions": {
        "concurrency": 3,
        "request_options": {
            "delay": 3000
        }
    }
}
php
use EliasHaeussler\CacheWarmup;

return static function (CacheWarmup\Config\CacheWarmupConfig $config) {
    $config->setCrawlerOption('concurrency', 3);
    $config->setCrawlerOption('request_options', [
        'delay' => 3000,
    ]);

    return $config;
};
yaml
crawlerOptions:
  concurrency: 3
  request_options:
    delay: 3000
bash
CACHE_WARMUP_CRAWLER_OPTIONS='{"concurrency": 3, "request_options": {"delay": 3000}}'

Option Reference

Both default crawlers are implemented as configurable crawlers:

The following configuration options are currently available for both crawlers:

concurrency 0.7.13+

🎨 Type: integer · 🐝 Default: 3

Define how many URLs are crawled concurrently.

INFO

Internally, Guzzle's Pool feature is used to send multiple requests concurrently using asynchronous requests. You may also have a look at how this is implemented in the library's RequestPoolFactory.

bash
./cache-warmup.phar --crawler-options '{"concurrency": 5}'
json
{
    "crawlerOptions": {
        "concurrency": 5
    }
}
php
use EliasHaeussler\CacheWarmup;

return static function (CacheWarmup\Config\CacheWarmupConfig $config) {
    $config->setCrawlerOption('concurrency', 5);

    return $config;
};
yaml
crawlerOptions:
  concurrency: 5
bash
CACHE_WARMUP_CRAWLER_OPTIONS='{"concurrency": 5}'

request_headers 0.7.13+

🎨 Type: array<string, mixed> · 🐝 Default: ['User-Agent' => '<default user-agent>']

A list of HTTP headers to send with each cache warmup request.

INFO

The default User-Agent is built in ConcurrentCrawlerTrait::getRequestHeaders().

bash
./cache-warmup.phar --crawler-options '{"request_headers": {"X-Foo": "bar", "User-Agent": "Foo-Crawler/1.0"}}'
json
{
    "crawlerOptions": {
        "request_headers": {
            "X-Foo": "bar",
            "User-Agent": "Foo-Crawler/1.0"
        }
    }
}
php
use EliasHaeussler\CacheWarmup;

return static function (CacheWarmup\Config\CacheWarmupConfig $config) {
    $config->setCrawlerOption('request_headers', [
        'X-Foo' => 'bar',
        'User-Agent' => 'Foo-Crawler/1.0',
    ]);

    return $config;
};
yaml
crawlerOptions:
  request_headers:
    X-Foo: bar
    User-Agent: 'Foo-Crawler/1.0'
bash
CACHE_WARMUP_CRAWLER_OPTIONS='{"request_headers": {"X-Foo": "bar", "User-Agent": "Foo-Crawler/1.0"}}'

request_method 0.7.13+

🎨 Type: string · 🐝 Default: HEAD

The HTTP method used to perform cache warmup requests.

bash
./cache-warmup.phar --crawler-options '{"request_method": "GET"}'
json
{
    "crawlerOptions": {
        "request_method": "GET"
    }
}
php
use EliasHaeussler\CacheWarmup;

return static function (CacheWarmup\Config\CacheWarmupConfig $config) {
    $config->setCrawlerOption('request_method', 'GET');

    return $config;
};
yaml
crawlerOptions:
  request_method: GET
bash
CACHE_WARMUP_CRAWLER_OPTIONS='{"request_method": "GET"}'

request_options 2.0+

🎨 Type: array<string, mixed> · 🐝 Default: []

Additional request options used for each cache warmup request.

bash
./cache-warmup.phar --crawler-options '{"request_options": {"delay": 500, "timeout": 10}}'
json
{
    "crawlerOptions": {
        "request_options": {
            "delay": 500,
            "timeout": 10
        }
    }
}
php
use EliasHaeussler\CacheWarmup;

return static function (CacheWarmup\Config\CacheWarmupConfig $config) {
    $config->setCrawlerOption('request_options', [
        'delay' => 500,
        'timeout' => 10,
    ]);

    return $config;
};
yaml
crawlerOptions:
  request_options:
    delay: 500
    timeout: 10
bash
CACHE_WARMUP_CRAWLER_OPTIONS='{"request_options": {"delay": 500, "timeout": 10}}'

write_response_body 4.0+

🎨 Type: bool · 🐝 Default: false

Define whether or not to write response body of crawled URLs to the corresponding response object.

WARNING

Enabling this option may significantly increase memory consumption during cache warmup.

bash
./cache-warmup.phar --crawler-options '{"write_response_body": true}'
json
{
    "crawlerOptions": {
        "write_response_body": true
    }
}
php
use EliasHaeussler\CacheWarmup;

return static function (CacheWarmup\Config\CacheWarmupConfig $config) {
    $config->setCrawlerOption('write_response_body', true);

    return $config;
};
yaml
crawlerOptions:
  write_response_body: true
bash
CACHE_WARMUP_CRAWLER_OPTIONS='{"write_response_body": true}'

Released under the GNU General Public License 3.0 (or later)