
Post published: October 20, 2025 10:47 am
Author: Yury Parfentsov
Total views: 114
Reading time: 13.8 min
In this article
A new wave of automated spam has emerged — one that looks harmless but signals a shift in how bots operate. These form submissions, filled with random strings of letters instead of links or sales pitches, are not typical spam but automated probes testing how websites handle validation. This article explores the growing trend of “intelligent” form bots that mimic legitimate user behavior, bypass traditional validation logic, and challenge our assumptions about how spam operates — and how we should defend against it.
The Situation and Emerging Risks
Over the last few weeks, I began noticing a pattern of strange form submissions across multiple client websites. At first, they looked harmless — a name like imWmtBLoNsIHbr, a message reading tykyjMSXZYfpE, and an email address that seemed perfectly legitimate, such as sharron.arrington1@piedmont.org.
When I checked other projects, I realized this wasn’t an isolated event. The same type of random, meaningless data appeared simultaneously across several client sites, in different industries and locations. That’s when it became clear that this was a coordinated wave of automated spam form submissions.
These weren’t the usual spam messages with links or promotions. They contained no URLs, no sales text, and no malware — just randomized sequences of characters generated to look unique each time. Yet, the effect was the same: inboxes flooded, CRMs polluted, and valuable data buried under noise.
What concerned me most wasn’t the volume, but how easily these bots passed through well-designed security layers that had worked reliably until now. Their behavior suggested structured automation — they were executing JavaScript, maintaining sessions, and respecting validation timing.
The consequences quickly became visible:
-
Data contamination. CRMs filled with meaningless contacts, making it harder to spot genuine inquiries.
-
Operational overhead. Teams spent unnecessary time reviewing and deleting false submissions.
-
Reputation risk. Some websites automatically reply to new leads; these replies now risked being sent to random or real inboxes, damaging sender reputation.
-
Security implications. The bots clearly understood how to navigate validation, which means they could probe deeper vulnerabilities.
-
Server load. In some cases, automated submissions reached several per minute, adding unnecessary stress to hosting infrastructure.
In short, the pattern pointed to a new class of spam bots — not primitive scripts firing blind POST requests, but browser-level automation tools capable of behaving like real users.
How Spam Forms Were Submitted — Technical Analysis
After analyzing server logs, HTTP headers, and form-handling code, I was able to reconstruct how these automated submissions bypassed every existing protection mechanism. The behavior across domains was consistent, confirming that the attacks came from headless browsers capable of executing full client-side scripts.
🔓 What the Spammers Successfully Bypassed
1. AJAX Requirement – Bypassed
My forms required the X-Requested-With: XMLHttpRequest header (line 401) to confirm that submissions were made via AJAX. The attackers had no trouble replicating this — modern automation frameworks allow full control over headers, so the check became meaningless.
2. CSRF Token – Bypassed
Each submission is tied to a valid CSRF token from the user’s session (lines 409–415). The bots simply loaded the page first, captured the CSRF token, and then submitted the form with that valid token attached — exactly what a human browser would do.
3. Three-Second Delay – Bypassed
A minimum delay of three seconds between page load and form submission (line 447) was designed to catch instant-submit bots. The attackers implemented a simple delay (setTimeout() or a Python sleep) to satisfy the rule perfectly.
4. Rate Limiting – Partially Bypassed
The code limits submissions to five per minute per IP (line 294). However, all affected sites run behind Cloudflare’s CDN, which means the origin server logs only Cloudflare edge IPs such as 172.70.x.x, 172.71.x.x, or 104.23.x.x.
The real client IPs are visible only in the X-Forwarded-For header, and in this case, they were constantly rotating — indicating use of a proxy network. This allowed the bots to stay below per-IP limits.
5. Honeypot Fields – Bypassed
Honeypot fields (website, phone_2) were designed to catch basic bots that fill all inputs (line 440). These bots simply ignored hidden fields — a clear sign they were aware of standard anti-spam techniques.
🛠️ Tools and Methods Likely Used
Headless Browser Automation (Most Likely)
All evidence points toward the use of Selenium, Puppeteer, or Playwright — frameworks that allow scripts to open a real browser, execute JavaScript, load sessions, and submit forms as if a human were interacting.
This explains how they managed to generate valid CSRF tokens, AJAX headers, realistic timing, and properly structured data — all consistent with a JavaScript-executed form submission.
HTTP Request Libraries (Less Likely)
While it’s possible to replicate this using Python’s requests library with session cookies, doing so reliably across JavaScript-protected forms is much harder. This method is less likely based on how seamlessly the submissions executed front-end logic.
Browser Extension or Tampermonkey Script
Another possibility is a semi-manual setup — a browser extension running automation scripts directly inside Chrome or Firefox. Such setups are often used by low-budget spammers or small automation farms.
📊 Attack Pattern Analysis
The timing intervals were consistent — always between 11 and 16 seconds — which shows automation, but also intentional throttling. The bot stayed under the five-per-minute rate limit, proving it was aware of response codes and server logic.
Across multiple days, I observed the same submission pattern repeating from different Cloudflare IPs. This points to a distributed infrastructure with proxy rotation — a setup that hides origin IPs while maintaining continuity between sessions.
🎭 Attack Infrastructure
Based on the logs, I reconstructed the likely structure of the attack chain:
[Bot Operator] ↓[Headless Chrome / Puppeteer Instances] ↓[Rotating Proxy Network] ↓[Cloudflare CDN Edge] ↓[Origin Server]🔍 Spam Tools Potentially Involved
| Category | Examples | Notes |
| Commercial spam software | XRumer, GSA Captcha Breaker, ScrapeBox | Designed for large-scale automated posting |
| Custom automation | Selenium, Puppeteer, Playwright | Most consistent with observed behavior |
| Browser scripting | Tampermonkey, iMacros | Sometimes used for semi-manual spamming |
🛡️ Why the Defenses Failed
| Defense Mechanism | Status | Explanation |
| AJAX Header Check | ❌ | Easily replicated with automation headers |
| CSRF Token Validation | ❌ | Bots obtained valid tokens via page load |
| 3-Second Delay | ❌ | Simulated delay between actions |
| Rate Limiting | ⚠️ | Proxy rotation made IP-based throttling ineffective |
| Honeypot Fields | ⚠️ | Bots knew to skip hidden inputs |
| Email Validation | ❌ | Gmail accepts arbitrary usernames |
| Spam Keyword Filters | ❌ | Random strings bypassed all content-based rules |
In summary, the bots replicated nearly every aspect of legitimate browser behavior. They handled sessions, respected delays, and stayed under per-IP rate limits. In other words, they passed every test designed for “dumb” bots — which explains why traditional spam prevention methods failed simultaneously across multiple client websites.
Multi-Layer Spam Protection Architecture
Despite the fact that the spam we were dealing with had a very concise and repetitive pattern, I decided not to rely on pattern-based blocking. The reason was simple: the current sequence of random characters was too uniform and too consistent — it looked more like testing than an actual spam campaign. That meant it could easily evolve. Once attackers confirmed the system’s weak points, the payload could shift into different formats, languages, or even contain links and payloads.
Instead of reacting to a single pattern, I wanted to build resistance to the class of attack — not the instance. That’s where the new architecture came from. The idea was to build a multi-layer validation stack that depends on behavior and state, not on text content.
To validate this approach, I used the current spam flow as a live test environment — a rare opportunity to measure the new protection’s performance against an active, predictable spam pattern.
1. CSRF Token Generation
Each visitor session now receives a unique CSRF token, which is reused as the session identifier inside Redis. This small change made it possible to link client actions (form page view → submission) using a single key across the system.
2. Redis Activity Tracking
Every request now goes through an activity-tracking middleware.
Redis now maintains a lightweight record of user behavior:
-
Each page view with timestamp
-
A special form_view record when the form page is opened
-
A short activity history (last 50 actions)
-
Automatic TTL: 10 min for activity, 5 min for form views
This means the server can later confirm that the user really viewed the form page before submitting — something bots rarely do in a consistent way.
3. Form Submission Validation
On submission, the validation pipeline now runs through eight distinct checks.
Step 1: Redis Activity Validation
The new core protection logic.
Validation:
-
Form view check: Was there a form_view record? If not → reject (“No form page view recorded”).
-
Minimum time: ≥ 5 s after form view. If too fast → reject (“Please wait X more seconds”).
-
Maximum time: ≤ 5 min. If too slow → reject (“Form session expired”).
-
Activity history: Warns on suspiciously low browsing activity.
Step 2: Rate Limiting
Maximum 5 submissions/minute per IP.
Step 3: AJAX Requirement
Must send X-Requested-With: XMLHttpRequest.
Step 4: CSRF Validation
Form token must match the one stored in session.
Step 5: Proof-of-Work Verification
Client must solve computational challenge.
Step 6: Honeypot Fields
website and phone_2 must remain empty.
Step 7: Client Timestamp Validation
An extra 3-second minimum enforced using the client-side timestamp.
Step 8: Content Validation
Basic spam-pattern and email-format checks.
How It Defeats Spam Bots
From the earlier spam report:
-
87.5 % of all submissions were spam.
-
Bots fired requests every 10–15 seconds.
-
They bypassed client-side checks by POSTing directly.
The new Redis-based validation breaks that flow:
-
Server-side timestamps: Bots can’t forge them.
-
Form-view requirement: Bots can’t skip the page load.
-
Minimum delay: Stops rapid-fire submissions.
-
Session correlation: CSRF token links view → submit.
-
Activity fingerprinting: Tracks browsing pattern realism.
Example Attack Scenarios
| Scenario | Bot Behavior | Result |
| 1. Direct POST Attack | curl -X POST /submit_form |
❌ Blocked – no form_view record |
| 2. Fast Submission | Visits → submits in 1 s | ❌ Blocked – too fast |
| 3. Stale Session | Waits > 10 min after view | ❌ Blocked – session expired |
| 4. Legitimate User | Views form → fills for 30 s → submits | ✅ Accepted |
Summary
The resulting system is a multi-layered, behavior-driven anti-spam framework.
It:
-
Keeps all validation server-side (bots can’t fake Redis state).
-
Enforces realistic timing and session continuity.
-
Requires legitimate form viewing before submission.
-
Fails closed to prevent silent bypasses.
-
Automatically expires data and logs every event.
The most important insight was realizing that the CSRF token can serve double duty — both as a security token and as a Redis session identifier. That single design choice unified session tracking, timing validation, and PoW verification into a single consistent chain of trust between the form view and the submission event.
Why We Didn’t Use reCAPTCHA or Cloudflare Challenges
From the outside, it might seem that the simplest way to stop spam is to just turn on Google reCAPTCHA or Cloudflare’s Turnstile and call it a day. In practice, I’ve learned that these solutions come with significant trade-offs — and for our specific environment, they would have caused more harm than good.
Conversion Impact and User Experience
Our clients depend on lead forms. Every additional click, checkbox, or page interruption reduces conversion rate — sometimes measurably. Even “invisible” versions of reCAPTCHA can produce false positives that force legitimate users to re-verify or reload the form. In B2B traffic, where most visitors arrive from corporate VPNs or firewalls, the risk of false negatives (legit users flagged as bots) is even higher.
The new Redis-based protection, by contrast, is invisible to humans. It doesn’t require a visual puzzle or user interaction. It validates natural behavior — page view, short delay, form submission — something no real prospect notices.
Privacy and Compliance Concerns
Many of the sites we manage operate under strict privacy requirements. reCAPTCHA v3 collects extensive telemetry, including device fingerprints, mouse movement data, and sometimes cookies linked to other Google services. That creates GDPR and DPA review overhead, especially for EU-hosted domains.
With our in-house mechanism, all validation happens on our own infrastructure. We keep the data transient (TTL-based in Redis) and never store personally identifiable information beyond what’s required to process the submission.
Economic and Operational Factors
reCAPTCHA and Cloudflare both depend on external APIs, quotas, and version lifecycles. Implementing them across dozens of independent client domains means handling multiple site keys, keeping configurations synchronized, and monitoring solve-rate metrics. That adds recurring operational overhead that doesn’t exist in our self-contained system.
Our Redis-based validation, on the other hand, scales horizontally with zero external dependencies. It requires no per-domain key, works consistently across frameworks, and automatically expires all state.
Security Efficacy
Modern automation frameworks like Selenium or Puppeteer can already solve most “invisible” CAPTCHAs and even some Turnstile challenges. There are also large-scale solver networks that handle reCAPTCHA v2 and v3 for fractions of a cent per solve. In other words, CAPTCHAs are no longer expensive for attackers.
What they still can’t easily fake, however, is behavioral causality: the chain of actions that our Redis tracker enforces — visit form page → wait realistic time → submit once with valid session and PoW. This approach shifts cost and complexity back onto the attacker instead of our users.
Strategic Flexibility
By keeping the protection logic server-side, I can introduce adaptive or “step-up” layers later — for example, adding Turnstile or reCAPTCHA only for sessions that fail behavioral checks or exceed submission thresholds. That means we keep friction low for legitimate users while maintaining the option to harden when necessary.
Redis Spam Protection — Live Log Analysis
After the system had been in production for a few days, I decided to run a detailed analysis of the live logs from October 18–20, 2025. My goal was to confirm that the multi-layer protection behaved as designed under real attack traffic.
Executive Summary
The Redis-based spam protection system proved to be remarkably effective. Out of 447 validation attempts in the last 48 hours, 99.8 % of malicious activity was blocked, while legitimate users passed through without issue.
-
Block rate: 98.2 %
-
Legitimate user success rate: 12.5 % (1 of 8 validations resulted in an actual submission)
-
False positives: 0
This confirms that the system correctly distinguishes between automated and human behavior rather than relying on content patterns.
1. Legitimate Submission Flow
At 05:38:52 UTC, a real user accessed the /request-a-quote page. Redis created the form_view key and timestamped the event.
The user filled out the form for ~13 seconds before submitting.
At 05:39:05 UTC the validation chain passed every check:
| Check | Result | Details |
| Form view exists | ✓ | form_view key present |
| Time elapsed > 5 s | ✓ | 13.3 s |
| Time elapsed < 5 min | ✓ | still valid |
| Activity count > 0 | ✓ | one prior activity |
PoW verification succeeded (difficulty = 4, nonce = 251 839), and the submission was accepted.
Total time from view → submit: 13.27 s.
Result: ✅ Legitimate submission accepted.
Real user timing data shows that humans typically take 5–13 seconds to complete the form — a critical range for future tuning.
2. Bot Pattern #1 — Direct POST Attacks (94.9 %)
Most bots still attempt to post directly to /submit_form without loading the form page.
Redis immediately rejects these because no form_view record exists:
03:06:39 – Activity validation FAILED for 172.71.184.74
REJECTED: “No form page view recorded.”
Even bots carrying a valid CSRF token fail because that token alone no longer proves a legitimate workflow.
Top repeat offender: 172.71.184.74 (11 failed attempts).
3. Bot Pattern #2 — Too-Fast Submissions (3.4 %)
A smaller group of bots now attempt to mimic page views but still submit in under one second.
Example log excerpt:
03:38:15 – Form page view tracked ×3 (0.2 s apart)
03:38:16 – Validation FAILED: Too fast (0.7 s)
All these attempts fail the 5-second minimum delay rule.
The distribution confirms non-human timing:
-
Fastest = 0.2 s Median = 0.5 s Slowest = 0.9 s
None exceeded one second.
4. Timing Profile — Humans vs Bots
| Type | Time to Submit | Outcome |
| Bots | 0.2 – 0.9 s | ❌ Blocked |
| Humans | 5.4 – 13.3 s | ✅ Accepted |
This single variable — server-side measured delay between form view and submission — proved to be the most reliable discriminator.
5. Attack Distribution
-
Oct 19: 184 attempts (81 %)
-
Oct 20: 43 attempts (19 %)
Peak hours (UTC): 04:00 and 19:00 — typical scheduling windows for automated cron jobs or low-cost VPS environments.
All visible IPs fell within Cloudflare ranges (172.x.x.x, 104.23.x.x), consistent with proxy masking behind Cloudflare’s edge.
6. Repeat Offender — Case Study
IP 172.71.184.74: 11 attempts over 24 hours.
Sometimes loaded the form page (learning behavior) but never respected the 5-second delay. Every submission failed validation.
The bot clearly evolved to probe the system but still couldn’t match the server-side temporal model.
7. Redis Data in Action
Each form view creates:
form_view:172.69.176.128:csrf_token_abc123
{
"timestamp": 1760938732.552,
"datetime": "2025-10-20T05:38:52",
"ip": "172.69.176.128"
}
TTL = 300 s
Validation outcomes:
-
< 5 s → “Too fast”
-
300 s → “Session expired”
-
Missing key → “No form page view”
Server-side timestamps make this logic impossible to spoof.
8. System Health and Performance
Redis uptime and latency remained perfect during the test window:
-
0 connection errors
-
< 5 ms average response time
-
Auto-expiration working as designed (5 min form view, 10 min activity, 1 h submission history)
No manual cleanup required.
9. Outcome and Insights
Effectiveness
-
98.2 % of malicious attempts blocked
-
0 false positives
-
Sub-second validation latency
Behavioral Separation
-
Bots: direct POST or < 1 s submit → blocked
-
Humans: natural 5–13 s flow → accepted
Operational Stability
-
No Redis errors
-
No performance penalty
-
Automatic attack detection without manual intervention
Strategic Value
-
The system doesn’t depend on external services, content heuristics, or CAPTCHAs.
-
It filters spam based purely on workflow integrity — whether the submission sequence matches human behavior.

