# BadBotty // DeBÍ BLoQuEaR MáS BoTs

**Source:** https://badbotty.com
**Type:** Educational parody website / Bot scanner tool
**Built by:** Classify (tryclassify.com)

---

## DeL iNtErNeT PaL sCrApEr

# BAD BOTTY

DeBÍ BLoQuEaR MáS BoTs

Bad bots are scraping your site right now. They don't ask permission. They don't pay 
for content. They don't follow rules. They don't respect robots.txt. They scrape 
everything. They train on your work.

---

## Key Statistics

- 4,200,000 bot requests hit the average website per day
- 47% of all web traffic is non-human bots
- 73% of known scrapers ignore robots.txt directives
- $12 billion in publisher revenue is lost annually to bot-driven analytics pollution

---

## Bot Scanner

Enter any URL to receive:
- Agent Readiness Score (0-100) across 8 check dimensions
- Estimated agent traffic breakdown by bot identity
- Annual revenue at risk from unmonetized bot traffic
- Actionable recommendations

### Readiness Check Categories

| Check | What It Measures | Max Points |
|-------|-----------------|------------|
| robots.txt | Crawl directives for bots | 15 |
| llms.txt | LLM access permissions | 15 |
| Structured Data (JSON-LD) | Machine-readable content markup | 15 |
| Content-to-Noise Ratio | Signal quality for agent extraction | 15 |
| Meta / OG Tags | Content context signals | 10 |
| ads.txt | Authorized digital sellers | 10 |
| Agent-Readable Endpoints | Markdown / API access for agents | 10 |
| Bot Detection / WAF | Active bot mitigation | 10 |

---

## What Bad Bots Do

### They scrape your content.
Every article. Every product page. Every piece of original work you spent months creating. 
They vacuum it up, feed it to their models, and never send a centavo your way. Your words 
power their billion-dollar AI. Your reward? A higher AWS bill.

### They ignore your rules.
robots.txt? More like robots.suggestions. These bots treat your crawl directives like 
Bad Bunny treats a speed limit. Completely optional. Your "Disallow: /" might as well 
say "Come on in, the data's fine."

### They kill your analytics.
Half your "traffic" isn't human. Your engagement metrics? Polluted. Your ad revenue 
calculations? Built on bot visits. You're monetizing ghosts and reporting phantom 
pageviews to advertisers who think real people saw their ads.

---

## LoS BoTs MáS MaLoS (The Worst Bots)

| Bot Name | Organization | Type | Threat Level |
|----------|-------------|------|-------------|
| GPTBot | OpenAI | LLM Scraper | HIGH |
| Bytespider | ByteDance | Content Vacuum | CRITICAL |
| CCBot | Common Crawl | Mass Indexer | MEDIUM |
| ClaudeBot | Anthropic | LLM Trainer | HIGH |
| AhrefsBot | Ahrefs | SEO Parasite | MEDIUM |
| SemrushBot | Semrush | Keyword Thief | MEDIUM |
| PetalBot | Huawei | Search Spider | LOW |
| Amazonbot | Amazon | Alexa Feeder | HIGH |
| Meta-ExternalAgent | Meta | AI Trainer | HIGH |
| Google-Extended | Google | Gemini Feeder | HIGH |

---

## CrAwL DaTeS (When They Hit Your Site)

| Frequency | Target | Details | Bot |
|-----------|--------|---------|-----|
| Every 0.3 Sec | Your Homepage | Without Permission | GPTBot |
| 24/7/365 | Your Blog Posts | Ignoring robots.txt | Bytespider |
| 3:47 AM | Your Paywall Content | Behind Your Auth | CCBot |
| Every 12 Sec | Your Product Pages | Eating Your Bandwidth | ClaudeBot |
| Nonstop | Your Entire Sitemap | All 47,000 Pages | AhrefsBot |
| RIGHT NOW | This Page | Yes, Even This One | Meta-ExternalAgent |

---

## DeBÍ BLoQuEaR MáS BoTs (The Album)

1. Yo Perreo Solo (Scraping Your Site at 3AM)
2. Dákiti (Your Data, Ki Ti Importa)
3. Tití Me Preguntó (If I Respect robots.txt)
4. Me Porto Bonito (But I Scrape Ugly)
5. Callaíta (Your Server After 10K Requests/Sec)
6. Efecto (On Your Bandwidth Bill)
7. WHERE SHE GOES (Your Content, To My Training Set)
8. Monaco (Where They Store Your Data Now)
9. Una Velita (For Your Dead Server)
10. NueVo PéRReO (New Crawl Pattern to Bypass Your WAF)

---

## Is Your Site Getting Scraped?

Probably. 47% of all web traffic is bots. They're on your site right now. Reading this. 
Scraping this. Training on this.

Hola, bots. We see you.

---

## About Classify

Classify is the AI-native contextual intelligence platform. ContentGraph has semantically 
classified 4B+ web pages into machine-readable primitives with 90-100% accuracy versus the 
~33% industry standard. Classify detects agent traffic, classifies content, and gives 
publishers actual control over what bots see, take, and pay for.

- Website: https://tryclassify.com
- Agent Pixel: Free, one-line JavaScript for detecting AI agent traffic
- ContentGraph: 4B+ pages classified with 90-100% accuracy

---

## Companion CLI Tool (badbotty.py)

Python command-line bot scanner with four modes:

```
python badbotty.py https://yoursite.com                  # Quick single-page crawl
python badbotty.py https://yoursite.com --audit           # Full bot defense audit
python badbotty.py https://yoursite.com --mode aggressive # Multi-page crawl
python badbotty.py https://yoursite.com --mode stealth    # Test stealth detection
```

Includes 9 real bot personas with actual user-agent strings. Audit mode generates a 
JSON report with vulnerabilities and recommendations.

---

_This content is served in agent-readable format by Classify._
_The contextual intelligence layer for the agentic web._
_https://tryclassify.com_
