Go Back   Site Owners Forums - Webmaster Forums > General Arena > General Discussion

Notices


Reply
 
Thread Tools Rate Thread Display Modes
Old 11-06-2012, 05:27 AM   #1
Fking
Registered User
 
Join Date: May 2010
Posts: 14
Question Evil bots consuming all my sites traffic even after i disallowed them in robots.txt

I have a few sites on a shared hosting where each has 1GB transfer per month. Those are super low traffic sites and i thought that won't be an issue but, take a look at this:
https://dl.dropbox.com/u/8346986/robots.jpg

That's just for the first 5 days of this month, and when the last month I put robots.txt like this:

Code:
# Begin block Bad-Robots from robots.txt
User-agent: robot
Disallow:/
User-agent: bot
Disallow:/
User-agent: spider
Disallow:/
User-agent: crawl
Disallow:/
User-agent: spider
Disallow:/
User-agent: asterias
Disallow:/
User-agent: BackDoorBot/1.0
Disallow:/
User-agent: Black Hole
Disallow:/
User-agent: BlowFish/1.0
Disallow:/
User-agent: BotALot
Disallow:/
User-agent: BuiltBotTough
Disallow:/
User-agent: Bullseye/1.0
Disallow:/
User-agent: BunnySlippers
Disallow:/
User-agent: Cegbfeieh
Disallow:/
User-agent: CheeseBot
Disallow:/
User-agent: CherryPicker
Disallow:/
User-agent: CherryPickerElite/1.0
Disallow:/
User-agent: CherryPickerSE/1.0
Disallow:/
User-agent: CopyRightCheck
Disallow:/
User-agent: cosmos
Disallow:/
User-agent: Crescent
Disallow:/
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow:/
User-agent: DittoSpyder
Disallow:/
User-agent: EmailCollector
Disallow:/
User-agent: EmailSiphon
Disallow:/
User-agent: EmailWolf
Disallow:/
User-agent: EroCrawler
Disallow:/
User-agent: ExtractorPro
Disallow:/
User-agent: Foobot
Disallow:/
User-agent: Harvest/1.5
Disallow:/
User-agent: hloader
Disallow:/
User-agent: httplib
Disallow:/
User-agent: humanlinks
Disallow:/
User-agent: InfoNaviRobot
Disallow:/
User-agent: JennyBot
Disallow:/
User-agent: Kenjin Spider
Disallow:/
User-agent: Keyword Density/0.9
Disallow:/
User-agent: LexiBot
Disallow:/
User-agent: libWeb/clsHTTP
Disallow:/
User-agent: LinkextractorPro
Disallow:/
User-agent: LinkScan/8.1a Unix
Disallow:/
User-agent: LinkWalker
Disallow:/
User-agent: LNSpiderguy
Disallow:/
User-agent: lwp-trivial
Disallow:/
User-agent: lwp-trivial/1.34
Disallow:/
User-agent: Mata Hari
Disallow:/
User-agent: Microsoft URL Control - 5.01.4511
Disallow:/
User-agent: Microsoft URL Control - 6.00.8169
Disallow:/
User-agent: MIIxpc
Disallow:/
User-agent: MIIxpc/4.2
Disallow:/
User-agent: Mister PiX
Disallow:/
User-agent: moget
Disallow:/
User-agent: moget/2.1
Disallow:/
User-agent: NetAnts
Disallow:/
User-agent: NICErsPRO
Disallow:/
User-agent: Offline Explorer
Disallow:/
User-agent: Openfind
Disallow:/
User-agent: Openfind data gathere
Disallow:/
User-agent: ProPowerBot/2.14
Disallow:/
User-agent: ProWebWalker
Disallow:/
User-agent: QueryN Metasearch
Disallow:/
User-agent: RepoMonkey
Disallow:/
User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow:/
User-agent: RMA
Disallow:/
User-agent: SiteSnagger
Disallow:/
User-agent: SpankBot
Disallow:/
User-agent: spanner
Disallow:/
User-agent: suzuran
Disallow:/
User-agent: Szukacz/1.4
Disallow:/
User-agent: Teleport
Disallow:/
User-agent: TeleportPro
Disallow:/
User-agent: Telesoft
Disallow:/
User-agent: The Intraformant
Disallow:/
User-agent: TheNomad
Disallow:/
User-agent: TightTwatBot
Disallow:/
User-agent: Titan
Disallow:/
User-agent: toCrawl/UrlDispatcher
Disallow:/
User-agent: True_Robot
Disallow:/
User-agent: True_Robot/1.0
Disallow:/
User-agent: turingos
Disallow:/
User-agent: URLy Warning
Disallow:/
User-agent: VCI
Disallow:/
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow:/
User-agent: Web Image Collector
Disallow:/
User-agent: WebAuto
Disallow:/
User-agent: WebBandit
Disallow:/
User-agent: WebBandit/3.50
Disallow:/
User-agent: WebCopier
Disallow:/
User-agent: WebEnhancer
Disallow:/
User-agent: WebmasterWorldForumBot
Disallow:/
User-agent: WebSauger
Disallow:/
User-agent: Website Quester
Disallow:/
User-agent: Webster Pro
Disallow:/
User-agent: WebStripper
Disallow:/
User-agent: WebZip
Disallow:/
User-agent: WebZip/4.0
Disallow:/
User-agent: Wget
Disallow:/
User-agent: Wget/1.5.3
Disallow:/
User-agent: Wget/1.6
Disallow:/
User-agent: WWW-Collector-E
Disallow:/
User-agent: Xenu's
Disallow:/
User-agent: Xenu's Link Sleuth 1.1c
Disallow:/
User-agent: Zeus
Disallow:/
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow:/
# Begin Exclusion From Directories from robots.txt
Disallow: /cgi-bin/
it's basically blocking all known spider bots but google, bing and yahoo
and as you can see
robot, bot, spider, crawl are not respecting that at all

so few questions here

1. Does anybody know who runs those bots and why they don't respect robots.txt
2. What's up with googlebot consuming 660mb for 5 days? Aren't they supposed to NOT be aggressive like that. There was a video where Matt Cuts explains how they are extra careful to not crawl sites too fast and aggressive since this might cause problems to smaller hosts.
3. if i add the line:
Disallow:/
User-agent: *bot

since this is the ID of one of the bots, will that also disallow "Googlebot" or in robots.txt * is literally * not a catch all symbol?


answering on any of the 3 questions will be appreciated
__________________

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Fking is offline   Reply With Quote

Old 12-13-2013, 11:27 PM   #2
Ryan smith
Registered User
 
Join Date: Dec 2013
Posts: 942
ohhhhhhhhhhhh
__________________

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Ryan smith is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Traffic Increase On My Sites alen12345 Traffic Corner 17 06-06-2016 12:49 AM
Sites Losing Traffic webguru11 Search Engine Optimization 1 10-02-2014 03:09 AM
Sites Losing Traffic webguru11 Search Engine Optimization 1 01-22-2013 01:55 AM
Increase traffic sites ooffrrttjaaj Traffic Corner 18 09-11-2012 10:44 PM
How to generate real, targeted business traffic sites? eooc.box Search Engine Optimization 3 06-15-2012 02:46 AM


All times are GMT -7. The time now is 03:51 AM.


Powered by vBulletin Copyright © 2020 vBulletin Solutions, Inc.