phinde.git
3 years agoadd log file support v0.2.1
Christian Weiske [Sat, 3 Dec 2016 21:32:19 +0000 (22:32 +0100)]
add log file support

3 years agoDo not crash status page when gearman worker is not registered
Christian Weiske [Sat, 3 Dec 2016 21:32:08 +0000 (22:32 +0100)]
Do not crash status page when gearman worker is not registered

3 years agoRemove URL check from process.php. Checking is done in Crawler already
Christian Weiske [Sat, 3 Dec 2016 13:28:21 +0000 (14:28 +0100)]
Remove URL check from process.php. Checking is done in Crawler already

3 years agoblacklist config option is not used
Christian Weiske [Sat, 3 Dec 2016 13:18:53 +0000 (14:18 +0100)]
blacklist config option is not used

3 years agobaseurl config option
Christian Weiske [Sat, 3 Dec 2016 12:15:48 +0000 (13:15 +0100)]
baseurl config option

3 years agoscript to renew websub subscriptions v0.2.0
Christian Weiske [Fri, 25 Nov 2016 06:54:49 +0000 (07:54 +0100)]
script to renew websub subscriptions

3 years agoshow subscriptions on status page
Christian Weiske [Thu, 24 Nov 2016 22:11:52 +0000 (23:11 +0100)]
show subscriptions on status page

3 years agohelp text in sidebar
Christian Weiske [Thu, 24 Nov 2016 21:38:56 +0000 (22:38 +0100)]
help text in sidebar

3 years agoautofocus input field if there is no query
Christian Weiske [Thu, 24 Nov 2016 21:24:17 +0000 (22:24 +0100)]
autofocus input field if there is no query

3 years agomake search bar visible on status page
Christian Weiske [Thu, 24 Nov 2016 21:20:43 +0000 (22:20 +0100)]
make search bar visible on status page

3 years agolink status page
Christian Weiske [Thu, 24 Nov 2016 21:20:33 +0000 (22:20 +0100)]
link status page

3 years agowebsub subcriptions work
Christian Weiske [Thu, 24 Nov 2016 21:09:28 +0000 (22:09 +0100)]
websub subcriptions work

3 years agoConfiguration for default sort order
Christian Weiske [Thu, 17 Nov 2016 17:21:14 +0000 (18:21 +0100)]
Configuration for default sort order

3 years agonew pager
Christian Weiske [Wed, 16 Nov 2016 10:14:23 +0000 (11:14 +0100)]
new pager

3 years agoformat document number on status page
Christian Weiske [Fri, 11 Nov 2016 20:26:34 +0000 (21:26 +0100)]
format document number on status page

3 years agoimprove status page
Christian Weiske [Fri, 11 Nov 2016 20:13:56 +0000 (21:13 +0100)]
improve status page

3 years agostatus page
Christian Weiske [Fri, 11 Nov 2016 19:54:12 +0000 (20:54 +0100)]
status page

3 years agoadd log class
Christian Weiske [Thu, 10 Nov 2016 19:52:35 +0000 (20:52 +0100)]
add log class

3 years agopager: move next and prev links to the outside for easier clicking
Christian Weiske [Thu, 10 Nov 2016 14:22:05 +0000 (15:22 +0100)]
pager: move next and prev links to the outside for easier clicking

3 years agoadd command to shut down a worker
Christian Weiske [Thu, 10 Nov 2016 14:13:51 +0000 (15:13 +0100)]
add command to shut down a worker

3 years agoproperly handle noindex pages
Christian Weiske [Wed, 9 Nov 2016 20:46:05 +0000 (21:46 +0100)]
properly handle noindex pages

3 years agoBig patch merging crawling+indexing into one command, new json document structure
Christian Weiske [Mon, 7 Nov 2016 20:41:36 +0000 (21:41 +0100)]
Big patch merging crawling+indexing into one command, new json document structure

3 years agosetup: check json before dropping current index
Christian Weiske [Sun, 6 Nov 2016 16:16:15 +0000 (17:16 +0100)]
setup: check json before dropping current index

4 years agoMake title configurable
Christian Weiske [Fri, 2 Sep 2016 16:05:00 +0000 (18:05 +0200)]
Make title configurable

Resolves: #11

4 years agoLink github
Christian Weiske [Fri, 2 Sep 2016 16:04:30 +0000 (18:04 +0200)]
Link github

4 years agoSupport multiple "nick:" terms in search field
Christian Weiske [Fri, 2 Sep 2016 16:01:58 +0000 (18:01 +0200)]
Support multiple "nick:" terms in search field

Resolves: #17

4 years agoperformance debug timer
Christian Weiske [Fri, 2 Sep 2016 15:54:15 +0000 (17:54 +0200)]
performance debug timer

4 years agoFix chat log links
Christian Weiske [Fri, 2 Sep 2016 13:20:17 +0000 (15:20 +0200)]
Fix chat log links

Resolves: #16

4 years agomassively improve crawl speed by ditching "exists" queries
Christian Weiske [Fri, 2 Sep 2016 09:01:28 +0000 (11:01 +0200)]
massively improve crawl speed by ditching "exists" queries

4 years agomicro optimization for "exists" ES queries
Christian Weiske [Thu, 1 Sep 2016 18:36:23 +0000 (20:36 +0200)]
micro optimization for "exists" ES queries

4 years agoMake search result hit template configurable, add chat template
Christian Weiske [Thu, 1 Sep 2016 06:11:44 +0000 (08:11 +0200)]
Make search result hit template configurable, add chat template

Resolves: #9

4 years agoAlways show text, make text extract size configurable.
Christian Weiske [Thu, 1 Sep 2016 05:47:49 +0000 (07:47 +0200)]
Always show text, make text extract size configurable.

Resolves: #8

4 years agoremove anchor from source URLs
Christian Weiske [Thu, 1 Sep 2016 05:38:08 +0000 (07:38 +0200)]
remove anchor from source URLs

4 years agotell why crawler stops
Christian Weiske [Tue, 30 Aug 2016 19:37:50 +0000 (21:37 +0200)]
tell why crawler stops

4 years agoAdd crawlBlacklist configuration option
Christian Weiske [Tue, 30 Aug 2016 11:35:05 +0000 (13:35 +0200)]
Add crawlBlacklist configuration option

Resolves: #7

4 years agoAllow worker instances of multiple projects in parallel
Christian Weiske [Tue, 30 Aug 2016 11:10:03 +0000 (13:10 +0200)]
Allow worker instances of multiple projects in parallel

Change "queuePrefix" configuration in each project

Resolves: #5

4 years agoFix notice
Christian Weiske [Tue, 30 Aug 2016 11:05:14 +0000 (13:05 +0200)]
Fix notice

4 years agoMake phinde-worker configurable; allow queue selection
Christian Weiske [Tue, 30 Aug 2016 11:03:26 +0000 (13:03 +0200)]
Make phinde-worker configurable; allow queue selection

Resolves #6

4 years agoOption to disable linked URL indexing
Christian Weiske [Tue, 30 Aug 2016 06:13:33 +0000 (08:13 +0200)]
Option to disable linked URL indexing

Resolves: #2

4 years agoAdd support for modification date queries: "before:", "after:" and "date:"
Christian Weiske [Tue, 30 Aug 2016 06:05:00 +0000 (08:05 +0200)]
Add support for modification date queries: "before:", "after:" and "date:"

Resolves: #4

4 years agoSupport "nick:cweiske" search syntax as alias for "author.name"
Christian Weiske [Tue, 30 Aug 2016 05:36:34 +0000 (07:36 +0200)]
Support "nick:cweiske" search syntax as alias for "author.name"

Resolves: #3

4 years agoRespect <meta name="robots" content="noindex"/>
Christian Weiske [Mon, 29 Aug 2016 20:59:16 +0000 (22:59 +0200)]
Respect <meta name="robots" content="noindex"/>

Fixes: #1

4 years agoSend If-Modified-Since header on crawling and indexing
Christian Weiske [Mon, 29 Aug 2016 18:30:45 +0000 (20:30 +0200)]
Send If-Modified-Since header on crawling and indexing

4 years agoadd LICENSE file
Christian Weiske [Thu, 26 May 2016 13:20:23 +0000 (15:20 +0200)]
add LICENSE file

4 years agowip pubsubhubbub
Christian Weiske [Thu, 31 Mar 2016 18:46:01 +0000 (20:46 +0200)]
wip pubsubhubbub

4 years agoopensearch paging
Christian Weiske [Fri, 12 Feb 2016 16:04:42 +0000 (17:04 +0100)]
opensearch paging

4 years agotrim query string
Christian Weiske [Fri, 12 Feb 2016 06:43:25 +0000 (07:43 +0100)]
trim query string

4 years agoopensearch support v0.1.0
Christian Weiske [Thu, 11 Feb 2016 21:43:34 +0000 (22:43 +0100)]
opensearch support

4 years agosupport base href
Christian Weiske [Thu, 11 Feb 2016 19:02:30 +0000 (20:02 +0100)]
support base href

4 years agosanitize title better
Christian Weiske [Thu, 11 Feb 2016 16:37:12 +0000 (17:37 +0100)]
sanitize title better

4 years agouse correct meta robots attribute
Christian Weiske [Thu, 11 Feb 2016 16:00:58 +0000 (17:00 +0100)]
use correct meta robots attribute

4 years agodebug option for crawler
Christian Weiske [Thu, 11 Feb 2016 07:43:01 +0000 (08:43 +0100)]
debug option for crawler

4 years agoadd date sorting
Christian Weiske [Wed, 10 Feb 2016 21:02:11 +0000 (22:02 +0100)]
add date sorting

4 years agoremove debug statement
Christian Weiske [Wed, 10 Feb 2016 20:15:35 +0000 (21:15 +0100)]
remove debug statement

4 years agocrawler supports "nofollow" now
Christian Weiske [Wed, 10 Feb 2016 16:26:15 +0000 (17:26 +0100)]
crawler supports "nofollow" now

4 years agosend accept header during crawl
Christian Weiske [Wed, 10 Feb 2016 16:09:56 +0000 (17:09 +0100)]
send accept header during crawl

4 years agosome styling, noindex for search result pages
Christian Weiske [Wed, 10 Feb 2016 14:14:34 +0000 (15:14 +0100)]
some styling, noindex for search result pages

4 years agorework crawler; add atom link extraction
Christian Weiske [Wed, 10 Feb 2016 13:56:20 +0000 (14:56 +0100)]
rework crawler; add atom link extraction

4 years agoabout section readme
Christian Weiske [Sat, 6 Feb 2016 19:27:58 +0000 (20:27 +0100)]
about section readme

4 years agoadd site GET parameter
Christian Weiske [Fri, 5 Feb 2016 05:48:45 +0000 (06:48 +0100)]
add site GET parameter

4 years agodefault config
Christian Weiske [Thu, 4 Feb 2016 22:59:52 +0000 (23:59 +0100)]
default config

4 years agodo not exit on null query
Christian Weiske [Thu, 4 Feb 2016 22:58:00 +0000 (23:58 +0100)]
do not exit on null query

4 years agocheck for content attributes
Christian Weiske [Thu, 4 Feb 2016 22:55:41 +0000 (23:55 +0100)]
check for content attributes

4 years agoremove multiple tags
Christian Weiske [Thu, 4 Feb 2016 22:46:45 +0000 (23:46 +0100)]
remove multiple tags

4 years agodo not show filter headline if there are none
Christian Weiske [Thu, 4 Feb 2016 16:23:14 +0000 (17:23 +0100)]
do not show filter headline if there are none

4 years agoshow query time
Christian Weiske [Thu, 4 Feb 2016 16:20:23 +0000 (17:20 +0100)]
show query time

4 years agochange default query operator to AND
Christian Weiske [Thu, 4 Feb 2016 16:12:14 +0000 (17:12 +0100)]
change default query operator to AND

4 years agoShow site search reset link
Christian Weiske [Thu, 4 Feb 2016 16:10:49 +0000 (17:10 +0100)]
Show site search reset link

4 years agoescape html in search results
Christian Weiske [Thu, 4 Feb 2016 15:58:33 +0000 (16:58 +0100)]
escape html in search results

4 years agofix indexing, boost config
Christian Weiske [Wed, 3 Feb 2016 21:37:15 +0000 (22:37 +0100)]
fix indexing, boost config

4 years agono simplexml anymore, content extraction improvements
Christian Weiske [Wed, 3 Feb 2016 21:18:52 +0000 (22:18 +0100)]
no simplexml anymore, content extraction improvements

4 years agofollow redirect, do not verify ssl certificates, use final after-redirect url
Christian Weiske [Wed, 3 Feb 2016 20:25:34 +0000 (21:25 +0100)]
follow redirect, do not verify ssl certificates, use final after-redirect url

4 years agoadd site search, highlighting
Christian Weiske [Wed, 3 Feb 2016 20:12:17 +0000 (21:12 +0100)]
add site search, highlighting

4 years agoshow elasticsearch query time
Christian Weiske [Wed, 3 Feb 2016 19:03:35 +0000 (20:03 +0100)]
show elasticsearch query time

4 years agofiltering works
Christian Weiske [Wed, 3 Feb 2016 16:23:06 +0000 (17:23 +0100)]
filtering works

4 years agofirst frontend
Christian Weiske [Wed, 3 Feb 2016 05:21:30 +0000 (06:21 +0100)]
first frontend

4 years agofirst kinda working version
Christian Weiske [Mon, 1 Feb 2016 19:18:59 +0000 (20:18 +0100)]
first kinda working version