A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later.
Comments, questions or suggestions should go in the support topic at http://forums.oscommerce.com/index.php?showtopic=112609
Legend: 
Download

Report
Added strings:
lbot (mlbot)
scoutjet
Corrected "pansclient" to "panscient".
Added strings:
nokia6682/
pansclient
vouager-hc
Added strings blaiz, yodao. Removed java/ which was causing problems for some payment processors.
Changed generic string "seek" to "seeker" and "seek." so as not to trip over users of the SeekMoToolbar.
Added strings:
vbot
-bot
pagebull
pogodak
snappy
Added strings:
charlotte
publisher
Added string:
microsoft url control
Added strings:
combine
twisted
depspid
Added strings curl, minirank, yandex
Added strings:
/bot
biglotron
gralon
Added strings:
.bot
adressendeutschland
kretrieve
openintelligencedata
Updated spiders.txt
Thanks For other friends !
Best Spiders List with knocker Spiders List!
A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later.
Added strings:
page_verifier
python
Removed stonebridgecomputing's edits, which were incorrect and inappropriate. Added the following strings:
comagent (192.comagent)
psycheclone
yeti
walker (skywalker, also now covers LinkWalker)
wwwster
dataparksearch
onetszukaj
Added following strings:
Sensis Web Crawler
inktomisearch.com
Cheers,
Chris
Added strings:
zbot (gizbot3)
noyona
sphider
ditto
Added strings:
genieknows
najdi
Consolidated several others to "seek".
Remove "ox/" which had the side effect of disabling Firefox. Sorry about that...
Added strings:
csci
dtaagent
ebingbong
ox/
Added string bdfetch, enhanced readme.txt
Removed previous edits as they are redundant. Added:
digger
mapoftheinternet
wire
Added the following:
npbot
almaden.ibm.com
ask jeeves
bdcindexer
become
Delorie.com
holmes
homer
kit_fireball
mnogo
poodle
seventwentyfour
worm
to spiders-large.txt
Removed gigabot, as it was already covered
Added tarantula and wavefire
gigabot (from gigablast.com) added.
Strings added:
poirot
silk
theophrastus
twiceler
updated
xirq
Also added bot/ and _bot which may catch some more. And though I have not seen these at my sites, I added strings blog and blo. to accomodate the previous contributor.
Added:
blogslive
ping.blo.gs
topicblogs
Added strings:
ejupiter
heritrix (metacarta)
htdig
jakarta
Added strings:
ccubee
salty
spinner (webspinner.de)
Added string:
sbider (sitesell.com)
Added:
helix (www.sitesearch.ca)
Added strings:
bot.
cfetch
Strings added:
miva
mbot (for dumbot)
tbot (for btbot)
libwww (already in large list)
Strings added:
kbot (mojeekbot)
mnogo (mnogosearch)
Strings added:
falcon
ingrid
omni
sna-
sygol
.... (Strange spider that doesn't obey robots.txt)
Strings added:
boitho
dmoz
findlinks
Added:
pbot (for aipbot)
multitext
shopwiki (thanks Rob123 for this one)
Added "harvest" to small list - was already in large list.
Added strings:
atlocal
objectssearch
Strings added:
java/
volcano
wget
Strings added:
ivia
/teoma (replaces teoma)
Strings added:
homerweb
lwp
Strings added:
smartwit
holmes
accoona
mj12
mediapartners
Updated to catch the following spiders:
Xenu (not really a spider, but a link checking tool you might use)
aspseek
StackRambler
Knowledge.com
Reorganized and optimized to catch more spiders and to do it faster - common substrings moved to front of list. New robots seen include geonabot and innerprisebot.
Added "ebot" (become.com)
Added "searchbot" (eliyon.com)
Added "pear." (PEAR HTTP_REQUEST class)
Added diamondbot (Gator/Claria)
Added to small list: scrubby
Added to both lists: sohu
Added: myweb, worldlight
Removed from smaller list: linkalarm (link validator, not a spider) I kept it in the larger list.
New spiders added:
goforit
larbin (was already in spiders-large.txt)
New active spiders added:
booch
iltrovatore
linkalarm
nameprotect
poppelsdorf
This upload further optimizes the spiders.txt file, and includes an optional spiders-large.txt, based on ChrisW123's upload. The larger file will catch additional spiders (which may not be active nowadays) at the cost of slower page loads. The large file has also been optimized and corrected somewhat.
If you have any suggestions or changes for this contribution, please contact me first so that we can work together. Thanks.
I've taken Steve's list and compared it to my huge list of spiders, and added the missing ones from his into mine, to create this update. All others files included (which is just the readme.txt file).
Thanks Steve for this contribution!
A replacement for catalog/includes/spiders.txt - updated with newly seen spiders and optimized for quicker processing. For 2.2-MS2 or later.
Comments, questions or suggestions should go in the support topic at http://forums.oscommerce.com/index.php?showtopic=112609