1 **********************************
2 phinde - generic web search engine
3 **********************************
4 Self-hosted search engine you can use for your static blog or about
5 any other website you want search functionality for.
7 My live instance is at http://search.cweiske.de/ and indexes my
8 website, blog and all linked URLs.
14 - Crawler and indexer with the ability to run many in parallel
15 - Shows and highlights text that contains search words
16 - Boolean search queries:
18 - ``foo bar`` searches for ``foo AND bar``
20 - ``title:foo`` searches for ``foo`` only in the page title
21 - Facets for tag, domain, language and type
24 - ``before:2016-08-30`` - modification date before that day
25 - ``after:2016-08-30`` - modified after that day
26 - ``date::2016-08-30`` - exact modification day match
29 - Query: ``foo bar site:example.org/dir/``
30 - or use the ``site`` GET parameter:
31 ``/?q=foo&site=example.org/dir``
32 - OpenSearch support with HTML and Atom result lists
33 - Instant indexing with WebSub (formerly PubSubHubbub)
49 #. Install and run Elasticsearch and Gearman
50 #. Get a local copy of the code::
52 $ git clone https://git.cweiske.de/phinde.git phinde
54 #. Install dependencies via PEAR::
56 $ pear install console_commandline net_url2
58 #. Point your webserver's document root to phinde's ``www`` directory
59 #. Copy ``data/config.php.dist`` to ``data/config.php`` and adjust it.
60 Make sure your add your domain to the crawl whitelist.
61 #. Run ``bin/setup.php`` which sets up the Elasticsearch schema
62 #. Put your homepage into the queue::
64 $ ./bin/process.php http://example.org/
66 #. Start at least one worker to process the crawl+index queue::
68 $ ./bin/phinde-worker.php
70 #. Check phinde's status page in your browser.
71 The number of open tasks should be > 0, the number of workers also.
74 Re-index when your site changes
75 ===============================
76 When your site changed, the search engine needs to re-crawl and re-index
79 Simply tell phinde that something changed by running::
81 $ ./bin/process.php http://example.org/foo.htm
83 phinde supports HTML pages and Atom feeds, so if your blog has a feed
84 it's enough to let phinde reindex that one.
85 It will find all linked pages automatically.
90 Adding a simple search form to your website is easy.
93 - ``<form>`` tag with an action that points to the phinde instance
94 - Search text field with name of ``q``.
98 <form method="get" action="http://phinde.example.org">
99 <input type="text" name="q" placeholder="Search text"/>
100 <button type="submit">Search</button>
106 When using systemd, you can let it run multiple worker instances when
109 #. Copy files ``data/systemd/phinde*.service`` into ``/etc/systemd/system/``
110 #. Adjust user and group names, and the work directories
111 #. Enable three worker processes::
113 $ systemctl daemon-reload
114 $ systemctl enable phinde@1
115 $ systemctl enable phinde@2
116 $ systemctl enable phinde@3
117 $ systemctl enable phinde
118 $ systemctl start phinde
119 #. Now three workers are running. Restarting the ``phinde`` service also
120 restarts the workers.
126 Run ``bin/renew-subscriptions.php`` once a day with cron.
127 It will renew the WebSub subscriptions.
134 Delete index data from one domain::
136 $ curl -iv -XDELETE -H 'Content-Type: application/json' -d '{"query":{"term":{"domain":"example.org"}}}' http://127.0.0.1:9200/phinde/_query
138 That's delete-by-query 2.0, see
139 https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/delete-by-query-usage.html
148 phinde's source code is available from http://git.cweiske.de/phinde.git
149 or the `mirror on github`__.
151 __ https://github.com/cweiske/phinde
156 phinde is licensed under the `AGPL v3 or later`__.
158 __ http://www.gnu.org/licenses/agpl.html
163 phinde was written by `Christian Weiske`__.
165 __ http://cweiske.de/