X-Git-Url: https://git.cweiske.de/phinde.git/blobdiff_plain/f67e8f0bc3f51f2d280a86a8c7cffa68d812efe1..4fa1d0022a87079f77eddb6a55ad67c82e9c1be3:/README.rst diff --git a/README.rst b/README.rst index 40f8c55..3ce17e0 100644 --- a/README.rst +++ b/README.rst @@ -19,20 +19,127 @@ Features - ``foo OR bar`` - ``title:foo`` searches for ``foo`` only in the page title - Facets for tag, domain, language and type +- Date search: + + - ``before:2016-08-30`` - modification date before that day + - ``after:2016-08-30`` - modified after that day + - ``date::2016-08-30`` - exact modification day match - Site search - Query: ``foo bar site:example.org/dir/`` - or use the ``site`` GET parameter: ``/?q=foo&site=example.org/dir`` +- OpenSearch support with HTML and Atom result lists +- Instant indexing with WebSub (formerly PubSubHubbub) ============ Dependencies ============ - PHP 5.5+ -- elasticsearch 2.0 -- gearman +- Elasticsearch 2.0 +- Gearman +- Console_CommandLine - Net_URL2 +- Twig 1.x + + +===== +Setup +===== +#. Install and run Elasticsearch and Gearman +#. Get a local copy of the code:: + + $ git clone https://git.cweiske.de/phinde.git phinde + +#. Install dependencies via PEAR:: + + $ pear install console_commandline net_url2 + $ pear channel-discover pear.twig-project.org + $ pear install twig/Twig + +#. Point your webserver's document root to phinde's ``www`` directory +#. Copy ``data/config.php.dist`` to ``data/config.php`` and adjust it. + Make sure your add your domain to the crawl whitelist. +#. Run ``bin/setup.php`` which sets up the Elasticsearch schema +#. Put your homepage into the queue:: + + $ ./bin/process.php http://example.org/ + +#. Start at least one worker to process the crawl+index queue:: + + $ ./bin/phinde-worker.php + +#. Check phinde's status page in your browser. + The number of open tasks should be > 0, the number of workers also. + + +Re-index when your site changes +=============================== +When your site changed, the search engine needs to re-crawl and re-index +the pages. + +Simply tell phinde that something changed by running:: + + $ ./bin/process.php http://example.org/foo.htm + +phinde supports HTML pages and Atom feeds, so if your blog has a feed +it's enough to let phinde reindex that one. +It will find all linked pages automatically. + + +Website integration +=================== +Adding a simple search form to your website is easy. +It needs two things: + +- ``
`` tag with an action that points to the phinde instance +- Search text field with name of ``q``. + +Example:: + + + + +
+ + +System service +============== +When using systemd, you can let it run multiple worker instances when +the system boots up: + +#. Copy files ``data/systemd/phinde*.service`` into ``/etc/systemd/system/`` +#. Adjust user and group names, and the work directories +#. Enable three worker processes:: + + $ systemctl daemon-reload + $ systemctl enable phinde@1 + $ systemctl enable phinde@2 + $ systemctl enable phinde@3 + $ systemctl enable phinde + $ systemctl start phinde +#. Now three workers are running. Restarting the ``phinde`` service also + restarts the workers. + + + +Cron job +======== +Run ``bin/renew-subscriptions.php`` once a day with cron. +It will renew the WebSub subscriptions. + + +===== +Howto +===== + +Delete index data from one domain:: + + $ curl -iv -XDELETE -H 'Content-Type: application/json' -d '{"query":{"term":{"domain":"example.org"}}}' http://127.0.0.1:9200/phinde/_query + +That's delete-by-query 2.0, see +https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/delete-by-query-usage.html ============