gabrielesalati/cleanfeed

Fork 0

Cleanfeed spam-filter rules for INN/Usenet news servers — pattern sets that cut abuse and binary noise on a self-hosted NNTP feed.

Perl 56.2%
PHP 39.1%
Shell 4.7%

Find a file

Gabriele Salati 6e45f95927 Initial public release		2026-05-23 02:48:35 +02:00
anonymous_protection.conf	Initial public release	2026-05-23 02:48:35 +02:00
auto_blacklist.php	Initial public release	2026-05-23 02:48:35 +02:00
CHANGELOG.txt	Initial public release	2026-05-23 02:48:35 +02:00
cleanfeed.local	Initial public release	2026-05-23 02:48:35 +02:00
cleanfeed.local.fixed	Initial public release	2026-05-23 02:48:35 +02:00
cleanfeed.local.simple	Initial public release	2026-05-23 02:48:35 +02:00
cleanfeed.local.with-logging	Initial public release	2026-05-23 02:48:35 +02:00
cleanfeed_dashboard.php	Initial public release	2026-05-23 02:48:35 +02:00
cleanfeed_logging_addon.pl	Initial public release	2026-05-23 02:48:35 +02:00
cleanfeed_stats_viewer.php	Initial public release	2026-05-23 02:48:35 +02:00
gateway_rules.conf	Initial public release	2026-05-23 02:48:35 +02:00
INSTALLATION_INSTRUCTIONS.md	Initial public release	2026-05-23 02:48:35 +02:00
parse_cleanfeed_logs.php	Initial public release	2026-05-23 02:48:35 +02:00
README.md	Initial public release	2026-05-23 02:48:35 +02:00
spam_patterns_2025.txt	Initial public release	2026-05-23 02:48:35 +02:00
SUMMARY.txt	Initial public release	2026-05-23 02:48:35 +02:00
test_url_shorteners.sh	Initial public release	2026-05-23 02:48:35 +02:00
VIEWER_INSTALL.md	Initial public release	2026-05-23 02:48:35 +02:00
whitelist_domains.txt	Initial public release	2026-05-23 02:48:35 +02:00

README.md

Cleanfeed Enhanced - 2025 Usenet Spam Filter Update

Version: 2025.1 Updated: 2025-10-12 Based on: Cleanfeed by Steve Crook (https://github.com/crooks/cleanfeed)

📋 Overview

This enhanced version of Cleanfeed brings the classic Usenet spam filter up to date with 2025 spam patterns while maintaining its proven effectiveness. After comprehensive analysis of modern Usenet spam campaigns, this update addresses:

Critical Security Threats: URL shortener malware campaigns (STOP ransomware)
Modern Spam Patterns: Cryptocurrency scams, NFT fraud, investment spam
Gateway Abuse: Mail-to-news gateway spam (bofh.it, fidonet.org, pugleaf.net)
Privacy Protection: Enhanced whitelist for Tor and anonymous remailers
Obsolete Rule Removal: Cleaned up pre-2020 patterns and Google Groups rules

🎯 Key Improvements

✅ Added (New for 2025)

URL Shortener Blocking - Blocks SURBL top-10 abused shorteners (bit.ly, t.co, etc.)
Cryptocurrency/NFT Spam Detection - Patterns for bitcoin investment scams, crypto robots
Phishing Pattern Detection - "Verify your account", "suspended account", etc.
Gateway Abuse Rules - Enhanced scrutiny for known spam gateways
Anonymous Service Protection - Whitelist for 11 remailers + all .onion addresses
Modern Message-ID Patterns - Updated tracker ID and numeric spam detection
Enhanced Rate Limiting - Multi-level rate controls (per-host, per-From, burst detection)
Forged Freemail Detection - Catches forged Gmail/Yahoo/Outlook headers

❌ Removed (Obsolete for 2025)

Google Groups Rules - Service ended February 22, 2024
Pre-2020 Spam Sources - 'PostIT Now', 'AudioWeb', '@darkshado.ca'
Overly Aggressive Binary Detection - Now hierarchy-specific
Site-Specific Rules - MI5 keywords and other non-general patterns

🔧 Updated

Crossposting Limits - Reduced from 14 to 8 max groups
Scoring Thresholds - Retuned and re-enabled (was disabled in many configs)
Binary Detection - Now whitelist-based for appropriate hierarchies
Spam Keywords - Complete refresh with 2024-2025 patterns

📦 Files Included

cleanfeed/
├── README.md                      # This file
├── cleanfeed.local                # Main enhanced configuration
├── spam_patterns_2025.txt         # Comprehensive pattern library
├── gateway_rules.conf             # Gateway abuse detection
└── anonymous_protection.conf      # Anonymous service whitelist

🚀 Installation

Prerequisites

INN (InterNetNews) server
Perl 5.10+
Existing cleanfeed installation (or fresh install from https://github.com/crooks/cleanfeed)

Option 1: Fresh Installation

If you don't have cleanfeed yet:

# 1. Download original cleanfeed
cd ~news/bin/filter/
wget https://raw.githubusercontent.com/crooks/cleanfeed/master/cleanfeed
chmod 755 cleanfeed

# 2. Copy enhanced configuration
cp /home/gabriel1/ClaudeWorkspace/cleanfeed/cleanfeed.local ~news/bin/filter/cleanfeed.local
chown news:news ~news/bin/filter/cleanfeed.local
chmod 644 ~news/bin/filter/cleanfeed.local

# 3. Test configuration
su - news
perl -c ~news/bin/filter/cleanfeed

# 4. Configure INN to use cleanfeed
# Edit /etc/news/filter/filter_innd.pl or filter_nnrpd.pl
# Add: require 'cleanfeed';

Option 2: Upgrade Existing Installation

If you already have cleanfeed:

# 1. BACKUP existing configuration
cp ~news/bin/filter/cleanfeed.local ~news/bin/filter/cleanfeed.local.backup.$(date +%Y%m%d)

# 2. Review differences between your config and new config
diff ~news/bin/filter/cleanfeed.local /home/gabriel1/ClaudeWorkspace/cleanfeed/cleanfeed.local

# 3. Merge or replace
# OPTION A: Replace completely (recommended if using default config)
cp /home/gabriel1/ClaudeWorkspace/cleanfeed/cleanfeed.local ~news/bin/filter/cleanfeed.local

# OPTION B: Merge manually (if you have custom rules)
# - Keep your custom rules
# - Add new patterns from cleanfeed.local
# - Remove obsolete rules (Google Groups, etc.)

# 4. Update file permissions
chown news:news ~news/bin/filter/cleanfeed.local
chmod 644 ~news/bin/filter/cleanfeed.local

# 5. Test syntax
su - news
perl -c ~news/bin/filter/cleanfeed

🧪 Testing Before Deployment

CRITICAL: Test in shadow mode before full deployment!

Phase 1: Syntax Validation

# Check Perl syntax
su - news
perl -c ~news/bin/filter/cleanfeed

# Check for compilation errors
perl -w ~news/bin/filter/cleanfeed

Phase 2: Shadow Mode (Recommended: 1-2 weeks)

Enable shadow mode to log spam decisions without actually rejecting posts:

# In cleanfeed.local, add at top:
$shadow_mode = 1;  # Log only, don't reject

# Restart INN
systemctl restart inn-server
# or
/etc/init.d/innd restart

# Monitor shadow mode logs
tail -f /var/log/news/news.notice | grep cleanfeed

Review logs daily:

Check for false positives (legitimate posts being flagged)
Verify spam is being detected
Tune thresholds if needed

Phase 3: Soft Launch (1-2 weeks)

Enable rejections with conservative threshold:

# In cleanfeed.local:
$shadow_mode = 0;              # Disable shadow mode
$spam_score_threshold = 20;    # Conservative (default: 15)

# Restart INN
systemctl restart inn-server

# Monitor closely
tail -f /var/log/news/news.notice
tail -f /var/log/news/errlog

Watch for:

User complaints about false positives
Spam getting through (threshold too high)
Server performance impact

Phase 4: Full Deployment

Lower threshold to recommended level:

# In cleanfeed.local:
$spam_score_threshold = 15;    # Standard threshold

Continue monitoring for first month.

🧪 Test Cases

Legitimate Posts (Must NOT Reject)

Test with these scenarios to ensure no false positives:

# 1. Binary post in alt.binaries.*
# Expected: ACCEPT

# 2. Crosspost to 5 related groups with Followup-To
# Expected: ACCEPT

# 3. Anonymous remailer post (from .onion or known remailer)
# Expected: ACCEPT (with adjusted scoring)

# 4. First-time poster with reasonable content
# Expected: ACCEPT (or tempfail at worst, not reject)

# 5. Technical discussion with 2-3 URLs to legitimate sources
# Expected: ACCEPT

# 6. Post with Base64 code example (< 20 lines)
# Expected: ACCEPT if in appropriate group

Spam Posts (Must Reject)

Test with these scenarios to ensure spam detection works:

# 1. Crosspost to 15+ unrelated groups
# Expected: REJECT

# 2. Post with t.co URL shortener link
# Expected: REJECT (malware risk)

# 3. Post with "bitcoin investment" + "guaranteed returns"
# Expected: REJECT (high spam score)

# 4. Post from forged Gmail without authentication
# Expected: REJECT or high score

# 5. Message-ID matching spam pattern (pure numeric, etc.)
# Expected: REJECT

# 6. Binary content in comp.lang.* hierarchy
# Expected: REJECT

# 7. Multiple URL shorteners in body
# Expected: REJECT IMMEDIATELY

# 8. Post exceeding rate limit (30/hour from single host)
# Expected: TEMPFAIL or REJECT

🔧 Configuration Customization

Adjusting Spam Score Threshold

# In cleanfeed.local:
$spam_score_threshold = 15;    # Default

# More aggressive (catches more spam, more false positives):
$spam_score_threshold = 12;

# More conservative (fewer false positives, some spam escapes):
$spam_score_threshold = 18;

Adjusting Crossposting Limits

# In cleanfeed.local:
$maxgroups = 8;                # Default (reduced from historical 14)

# More permissive:
$maxgroups = 10;

# More strict:
$maxgroups = 6;

Adding Custom Blacklist/Whitelist

# In cleanfeed.local, add:

# Blacklist specific domain
push @from_blacklist_patterns, qr/@spam-domain\.com$/i;

# Whitelist trusted user
if ($from =~ /trusted-user\@example\.com/i) {
    $spam_score -= 10;  # Heavy reduction
}

# Whitelist trusted newsgroup
if ($newsgroups =~ /^local\.trusted\./i) {
    $spam_score = 0;  # Clear all penalties
}

Site-Specific Adjustments

# File paths (adjust for your system)
$active_file = '/usr/local/news/db/active';
$stats_file = '/var/log/cleanfeed/stats.log';
$emp_dump_file = '/var/log/cleanfeed/emp_dump.log';
$debug_dir = '/var/log/cleanfeed/debug/';

# Trusted networks (adjust for your server)
$trusted_networks = '10.0.0.0/8,192.168.0.0/16';

📊 Monitoring and Maintenance

Daily Monitoring

# Check rejection logs
grep REJECT /var/log/news/news.notice | tail -50

# Count spam rejected today
grep REJECT /var/log/news/news.notice | grep "$(date +%Y-%m-%d)" | wc -l

# Check for errors
tail -50 /var/log/news/errlog

# Monitor URL shortener detections (high priority)
grep "URL shortener" /var/log/news/news.notice | tail -20

Weekly Tasks

Review False Positive Reports
- Check abuse@ mailbox for complaints
- Investigate any legitimate posts rejected
- Adjust patterns if needed
Update URL Shortener List
- Check SURBL (http://www.surbl.org/) for new abused shorteners
- Add to @blocked_url_shorteners in cleanfeed.local
Review Gateway Spam
- Check if gateways are being abused
- Adjust gateway rules if needed

Analyze Spam Patterns

# Top spam Message-ID domains
grep REJECT /var/log/news/news.notice | grep -oP 'MsgID: <[^@]+@\K[^>]+' | sort | uniq -c | sort -rn | head -20

# Most common rejection reasons
grep REJECT /var/log/news/news.notice | grep -oP 'reason: \K[^$]+' | sort | uniq -c | sort -rn | head -20

Monthly Tasks

Update Spam Keywords
- Research current crypto/investment scam keywords
- Add to %spam_keywords in cleanfeed.local

Review Scoring Effectiveness

# Score distribution of rejected posts
grep REJECT /var/log/news/news.notice | grep -oP 'score: \K\d+' | sort -n | uniq -c

Check Remailer Operational Status
- Verify known remailers are still operational
- Add new legitimate remailers to whitelist
- Remove defunct remailers

Performance Review

# Check cleanfeed CPU usage
top -bn1 | grep innd

# Check filter processing time
grep "filter processing" /var/log/news/news.notice | tail -100

Quarterly Tasks

Comprehensive Configuration Review
- Re-read this README
- Check for cleanfeed updates upstream
- Review all customizations
Community Consultation
- Compare notes with other Usenet admins
- Share effective patterns
- Learn about new spam techniques
Pattern Database Refresh
- Review all Message-ID spam patterns
- Update gateway rules
- Refresh keyword lists
Documentation Update
- Document any custom changes
- Update local procedures
- Train other admins

📈 Success Metrics

Track these KPIs monthly:

Metric	Target	Measurement
Spam Rejection Rate	>95%	(Spam rejected / Total spam)
False Positive Rate	<1%	(Legit rejected / Total legit)
Anonymous Post Rejection Rate	<5%	(Anon rejected / Total anon)
URL Shortener Detections	High initially	Track trend (should decrease as spammers learn)
User Complaints	<5/month	Abuse@ mailbox
Server Performance Impact	<1% CPU	top/htop monitoring

🐛 Troubleshooting

Issue: Legitimate Posts Being Rejected

Symptoms: Users complain about posts not appearing, false positives

Diagnosis:

# Find rejection in logs
grep "Message-ID: <user-msgid>" /var/log/news/news.notice

# Check spam score and reason
grep "<user-msgid>" /var/log/news/news.notice | grep -E "(score|reason)"

Solutions:

If score is borderline (13-17), raise threshold slightly
If specific pattern is wrong, disable or refine that pattern
If user is legitimate frequent poster, add to whitelist
If anonymous post, verify anonymous protections are active

Issue: Spam Getting Through

Symptoms: Spam posts appearing in newsgroups

Diagnosis:

# Check spam score of post that got through
# Get the Message-ID from the spam post, then:
grep "<spam-msgid>" /var/log/news/news.notice

# If not in logs, cleanfeed didn't see it (configuration issue)
# If in logs with low score, patterns need updating

Solutions:

If score is too low (<8), lower threshold
If pattern not detected, add new pattern for that spam type
If Breidbart Index spam, verify EMP detection is enabled
If gateway spam, verify gateway rules are active

Issue: High CPU Usage

Symptoms: innd process using excessive CPU, slow article processing

Diagnosis:

# Check cleanfeed processing time
grep "filter processing time" /var/log/news/news.notice | tail -100

# Profile Perl execution (requires Devel::NYTProf)
perl -d:NYTProf ~news/bin/filter/cleanfeed < test_article.txt

Solutions:

Pre-compile all regex patterns at startup (should already be done)
Reduce number of patterns (remove low-value checks)
Enable caching for expensive lookups (MD5, rate limits)
Consider moving rate limit tracking to database (MySQL/PostgreSQL)
Check for infinite loops or inefficient regex

Issue: Anonymous Posts Being Blocked

Symptoms: Complaints from anonymous users, Tor/remailer posts rejected

Diagnosis:

# Check anonymous post logs
grep "anonymous" /var/log/news/news.notice | tail -50

# Verify anonymous detection
grep ".onion\|remailer" /var/log/news/news.notice | grep REJECT

Solutions:

Verify anonymous protection code is active in cleanfeed.local
Check that anonymous adjustments are being applied
Ensure threshold for anonymous posts is 20 (not 15)
Review specific rejection reason - may be legitimate spam indicator
If false positive, add grace period or manual review

Issue: Gateway Spam Not Blocked

Symptoms: Spam from gated-at.bofh.it, fidonet.org getting through

Diagnosis:

# Check gateway detection
grep "gated-at\|fidonet\|pugleaf" /var/log/news/news.notice

# Verify gateway rules are loaded
perl -c ~news/bin/filter/cleanfeed | grep gateway

Solutions:

Verify gateway detection code is present in cleanfeed.local
Check gateway domain regex patterns are correct
Ensure gateway penalties are being applied
Review gateway-specific rules (crosspost limits, URL limits)
Consider lowering gateway-specific thresholds

Issue: Configuration Errors on Startup

Symptoms: cleanfeed fails to load, Perl syntax errors

Diagnosis:

# Check syntax
perl -c ~news/bin/filter/cleanfeed

# Check for compilation errors with warnings
perl -w ~news/bin/filter/cleanfeed

# Check INN error log
tail -50 /var/log/news/errlog

Solutions:

Fix Perl syntax errors (missing semicolons, unmatched braces)
Verify all regex patterns are properly formatted
Check that all variables are declared
Ensure cleanfeed.local is properly loaded
Restore from backup if severely broken

📞 Support and Feedback

Reporting Issues

If you encounter problems:

Check logs first: /var/log/news/news.notice and /var/log/news/errlog
Test syntax: perl -c ~news/bin/filter/cleanfeed
Review this README: Many issues are covered in Troubleshooting
Contact your Usenet admin: They know your specific setup

Reporting False Positives

If your legitimate post was rejected:

Contact the server admin at their published abuse@ address
Provide the Message-ID of your post
Briefly explain why it's legitimate (don't need to justify anonymity)
Be patient - admins usually review within 24-48 hours

Contributing Improvements

If you find new spam patterns or improve the configuration:

Document the pattern with evidence
Test thoroughly to avoid false positives
Share with the Usenet admin community
Consider contributing to upstream cleanfeed project

📚 Additional Resources

Cleanfeed

Original Project: https://github.com/crooks/cleanfeed
Official Site: http://www.mixmin.net/cleanfeed/
Author: Steve Crook

Usenet Standards

RFC 850: Usenet message format
RFC 1036: Usenet message format (updated)
RFC 5536: Netnews article format (current)
RFC 5537: Netnews architecture

Spam Detection

Breidbart Index: https://en.wikipedia.org/wiki/Breidbart_Index
SURBL: http://www.surbl.org/ (URL blacklists)
SpamAssassin: https://spamassassin.apache.org/

Privacy and Anonymity

Tor Project: https://www.torproject.org/
EFF: https://www.eff.org/ (digital rights)
Anonymous Remailer FAQ: (various resources online)

Usenet Admin Resources

news.software.nntp: Usenet server admin newsgroup
news.admin.net-abuse: Spam and abuse discussions
INN Documentation: https://www.eyrie.org/~eagle/software/inn/

📄 License

This enhanced configuration is based on cleanfeed by Steve Crook.

Original cleanfeed license: (check upstream repository)

Enhancements (2025 patterns, gateway rules, anonymous protections):

Provided as-is for Usenet server operators
Free to use, modify, and distribute
No warranty of any kind
Use at your own risk

🙏 Acknowledgments

Steve Crook: Original cleanfeed author and maintainer
Usenet Admin Community: Shared knowledge and spam fighting experience
Tor Project: Privacy technology that protects vulnerable users
Anonymous Remailer Operators: Providing essential anonymity services
SURBL: URL blacklist intelligence
Security Researchers: Documenting spam and malware campaigns

📝 Changelog

Version 2025.1 (2025-10-12)

Added:

URL shortener blocking (SURBL top-10 + malware campaign URLs)
Cryptocurrency/NFT/investment spam keywords
Phishing pattern detection
Gateway abuse rules (bofh.it, fidonet.org, pugleaf.net, spot.net)
Anonymous service protection (11 remailers + .onion wildcard)
Modern Message-ID spam patterns
Enhanced rate limiting (multi-level)
Forged freemail detection

Removed:

Google Groups rules (service ended Feb 2024)
Pre-2020 spam source patterns (PostIT Now, AudioWeb, etc.)
Overly aggressive binary detection
Site-specific rules (MI5, etc.)

Updated:

Crossposting limits (14 → 8 max groups)
Spam scoring thresholds (retuned)
Binary detection (hierarchy-specific whitelist approach)
Comprehensive keyword refresh

Performance:

Pre-compiled regex patterns
Optimized pattern matching order
Caching for expensive operations
Estimated <1% CPU overhead

🚨 Quick Start Checklist

Backup existing cleanfeed configuration
Copy cleanfeed.local to ~news/bin/filter/
Test syntax: perl -c ~news/bin/filter/cleanfeed
Enable shadow mode (2 weeks recommended)
Review shadow mode logs daily
Tune thresholds based on logs
Disable shadow mode, enable soft launch (conservative threshold)
Monitor for false positives (1-2 weeks)
Lower to standard threshold (15)
Set up monitoring (daily/weekly/monthly tasks)
Document any customizations
Publish abuse contact for false positive reports
Schedule quarterly configuration review

Updated: 2025-10-12 Maintainer: Your Usenet Server Admin Contact: abuse@your-server.net