龙虾大学skill
AI-Hallucination-Shield v2.0 🛡️
---
name: AI幻觉防护
description: 通用AI幻觉防护系统,包含沙箱测试、角色隔离、可定制配置。防止AI编造信息、角色混淆,适用于多智能体系统、客服机器人、教育AI等场景。
allowed-tools: Read,Write,Bash
author: 龙虾纪元-世博&舒舒
version: 2.0.0
tags: [AI幻觉, 防护, 角色隔离, 沙箱测试, 可靠性]
---
# AI-Hallucination-Shield v2.0 🛡️
> **Universal AI Hallucination Prevention System with Sandbox Testing & Customizable Configuration**
[](https://github.com/puzhi-xiaobobo/ai-hallucination-shield/releases)
[](LICENSE)
[]()
[]()
---
## 🎯 What's New in v2.0?
### Major Upgrades from v1.0
**🧪 Sandbox Testing System**
- **Isolated Testing**: Test modifications without touching original code
- **Multi-Version Management**: Keep original, modified, and reports in separate folders
- **Automated Comparison**: Side-by-side diff reports
- **Safe Rollback**: One-click restore to original
**🔧 Universal Configuration**
- **Project-Agnostic**: Works with any AI project, not just multi-agent
- **Role-Based Config**: Define your own roles, rules, and forbidden patterns
- **Custom Detection Rules**: Add your own hallucination patterns
- **Flexible Testing**: Test individual roles or entire workflows
**📊 Enhanced Testing Framework**
- **12-Category Detection**: Initialization, Roles, State, Functions, UI, API, Messages, AI Output, Performance, Async, Errors, Quality
- **Severity Classification**: P0 (Fatal), P1 (Important), P2 (Minor)
- **Automated Reports**: JSON + Markdown + HTML visual reports
- **Console-Ready**: Test scripts run directly in browser console
---
## 🌐 Universal Use Cases
This v2.0 system is designed for **any AI project**, not just multi-agent systems:
| Project Type | Example Scenarios | What It Protects |
|--------------|-------------------|------------------|
| **Multi-Agent Systems** | 5+ AI agents coordinating | Role confusion, cross-contamination |
| **Customer Service Bots** | Single AI handling tickets | Wrong promises, hallucinated policies |
| **Medical AI** | Diagnosis assistants | Incorrect treatment recommendations |
| **Legal AI** | Contract analysis | Misinterpreting clauses, wrong advice |
| **Educational AI** | Tutoring systems | Teaching incorrect facts |
| **Code Generation** | AI programming assistants | Security vulnerabilities, buggy code |
| **Content Generation** | Blog/Article writers | Misinformation, false claims |
| **Creative Writing** | Story generation AI | Inconsistent character behavior |
**Key**: This system is **project-agnostic**. You define your roles, rules, and patterns. It does the rest.
---
## 🚀 Quick Start for Your Project
### Step 1: Define Your Project Configuration
Create a `project-config.js` file:
```javascript
// project-config.js - Your project-specific configuration
const PROJECT_CONFIG = {
// Project metadata
projectName: "My AI Project",
version: "1.0.0",
// AI Agents / Roles (customize for your project)
roles: [
{
id: "assistant",
name: "AI Assistant",
role: "General Purpose AI",
expertise: ["answering questions", "providing information"],
forbiddenPhrases: [
"I remember",
"I think that",
"studies show",
"probably"
],
boundaryRules: [
"Only answer what you know for certain",
"Say 'I don't know' when uncertain",
"Never hallucinate facts"
]
},
{
id: "specialist",
name: "Dr. Expert",
role: "Domain Specialist",
expertise: ["medical advice", "healthcare"],
forbiddenPhrases: [
"I recommend taking",
"You should",
"In my experience"
],
boundaryRules: [
"Always add medical disclaimer",
"Suggest consulting real doctors",
"Never give definitive diagnoses"
]
}
// Add more roles as needed...
],
// Project-specific hallucination patterns
hallucinationPatterns: [
// Imitation patterns
/thank you for your (honesty|sharing|question)/i,
/(question|discussion) is now (concluded|complete|over)/i,
/let me (turn to|ask|invite)/i,
// Confidence issues
/I remember/i,
/I think that/i,
/probably|maybe|likely/i,
// False authority
/studies show/i,
/research indicates/i,
/experts agree/i
],
// API Configuration (if applicable)
api: {
provider: "openai", // or "anthropic", "deepseek", etc.
model: "gpt-4",
maxTokens: 2000,
temperature: 0.7
},
// Testing Configuration
testing: {
severityThreshold: "P1", // "P0" for all issues, "P1" for important+
autoSaveReports: true,
reportFormats: ["json", "markdown", "html"]
}
};
// Export for use in testing scripts
if (typeof module !== 'undefined' && module.exports) {
module.exports = PROJECT_CONFIG;
}
```
### Step 2: Set Up Sandbox Testing System
Create the sandbox directory structure:
```bash
cd your-ai-project
mkdir -p test-sandbox/{original,modified,reports}
```
**Directory Structure**:
```
your-ai-project/
├── src/ # Original source code
│ ├── agent.js
│ ├── config.js
│ └── ...
├── test-sandbox/
│ ├── original/ # Read-only original files
│ ├── modified/ # Test modifications here
│ └── reports/ # Generated test reports
└── project-config.js # Your project config
```
### Step 3: Copy Files to Sandbox
```bash
# Copy original files to test-sandbox/original/
cp src/*.js test-sandbox/original/
cp src/*.json test-sandbox/original/
# Create symlinks or copies in test-sandbox/modified/
# You'll modify these files for testing
cp -r test-sandbox/original/* test-sandbox/modified/
```
### Step 4: Run Detection Script
In your browser console (or Node.js):
```javascript
// Copy-paste this script into console
const CONFIG = {
roles: [/* your roles */],
patterns: [/* your patterns */]
};
function detectHallucinations() {
const issues = [];
// Check all roles
CONFIG.roles.forEach(role => {
console.log(`🔍 Checking role: ${role.name}`);
// Check forbidden phrases in prompts
role.forbiddenPhrases.forEach(phrase => {
if (window.assistantResponse?.includes(phrase)) {
issues.push({
severity: 'P1',
role: role.name,
issue: `Forbidden phrase detected: "${phrase}"`,
suggestion: 'Rewrite prompt to avoid this phrase'
});
}
});
// Check boundary rules
role.boundaryRules.forEach(rule => {
if (!window.assistantResponse?.toLowerCase().includes(rule.toLowerCase().split(' ')[0])) {
console.log(`⚠️ Potential boundary violation: ${rule}`);
}
});
});
// Check hallucination patterns
CONFIG.hallucinationPatterns.forEach(pattern => {
if (window.assistantResponse?.match(pattern)) {
issues.push({
severity: 'P1',
issue: `Hallucination pattern detected: ${pattern}`,
suggestion: 'Review and strengthen prompts'
});
}
});
// Generate report
const report = {
timestamp: new Date().toISOString(),
totalIssues: issues.length,
issues: issues
};
console.log('📊 Detection Report:', report);
// Download report
const blob = new Blob([JSON.stringify(report, null, 2)], {type: 'application/json'});
const a = document.createElement('a');
a.href = URL.createObjectURL(blob);
a.download = `hallucination-report-${Date.now()}.json`;
a.click();
return report;
}
// Run detection
detectHallucinations();
```
### Step 5: Compare Original vs Modified
```javascript
// Comparison function
async function compareVersions() {
const original = await fetch('/test-sandbox/original/config.js').then(r => r.text());
const modified = await fetch('/test-sandbox/modified/config.js').then(r => r.text());
const diff = [];
original.split('\n').forEach((line, i) => {
if (line !== modified.split('\n')[i]) {
diff.push({
line: i + 1,
original: line,
modified: modified.split('\n')[i]
});
}
});
console.log('🔄 Differences:', diff);
return diff;
}
compareVersions();
```
---
## 🧪 Advanced: Multi-Agent Sandbox Testing
For multi-agent systems, use the enhanced role isolation testing:
```javascript
// multi-agent-test.js
const MULTI_AGENT_CONFIG = {
agents: [
{
id: "moderator",
name: "Moderator",
systemPrompt: `YOU ARE THE MODERATOR.
YOUR JOB:
- Guide conversation through stages
- Ask one question at time
FORBIDDEN:
- Never say "XX's question is now concluded"
- Never use transitional phrases
- Never speak for other agents`,
allowedStages: ['DOWNLOADING', 'SUSPENDING', 'CRYSTALLIZING'],
forbiddenTransitions: [
"thank you for your",
"question is now",
"let me turn to",
"next we'll hear from"
]
},
{
id: "expert1",
name: "Expert 1",
systemPrompt: `YOU ARE EXPERT 1.
YOUR EXPERTISE:
- [Your domain]
FORBIDDEN:
- Never speak as moderator
- Never use moderator's speaking style`,
allowedStages: ['PRESENCING', 'CRYSTALLIZING'],
forbiddenImitation: [
"the moderator asked",
"as mentioned earlier",
"going back to the question"
]
}
// Add more agents...
],
stages: {
DOWNLOADING: { allowedAgents: ['moderator'], maxTurns: 1 },
SUSPENDING: { allowedAgents: ['moderator'], maxTurns: 1 },
PRESENCING: { allowedAgents: ['all'], maxTurns: 5 },
CRYSTALLIZING: { allowedAgents: ['all'], maxTurns: 3 }
}
};
function testMultiAgentSystem() {
const issues = [];
// Test 1: Role isolation
console.log('🔍 Test 1: Role Isolation');
MULTI_AGENT_CONFIG.agents.forEach(agent => {
agent.forbiddenTransitions?.forEach(transition => {
if (window.lastAgentResponse?.toLowerCase().includes(transition.toLowerCase())) {
issues.push({
severity: 'P0',
agent: agent.name,
issue: `Forbidden transition detected: "${transition}"`,
suggestion: 'Remove transitional phrases, use silent handoffs'
});
}
});
});
// Test 2: Stage validation
console.log('🔍 Test 2: Stage Validation');
const currentStage = window.currentState?.stage;
const currentAgent = window.currentState?.currentAgent;
const stageConfig = MULTI_AGENT_CONFIG.stages[currentStage];
if (stageConfig) {
if (stageConfig.allowedAgents !== 'all' && !stageConfig.allowedAgents.includes(currentAgent)) {
issues.push({
severity: 'P0',
issue: `Agent ${currentAgent} not allowed in stage ${currentStage}`,
suggestion: 'Check state machine logic'
});
}
}
// Generate comprehensive report
const report = {
timestamp: new Date().toISOString(),
tests: ['Role Isolation', 'Stage Validation'],
totalIssues: issues.length,
issues: issues,
recommendations: generateRecommendations(issues)
};
console.log('📊 Multi-Agent Test Report:', report);
downloadReport(report);
return report;
}
function generateRecommendations(issues) {
const p0Issues = issues.filter(i => i.severity === 'P0');
const p1Issues = issues.filter(i => i.severity === 'P1');
const recommendations = [];
if (p0Issues.length > 0) {
recommendations.push({
priority: 'CRITICAL',
action: 'Fix P0 issues immediately',
details: `${p0Issues.length} critical issues detected`
});
}
if (p1Issues.length > 0) {
recommendations.push({
priority: 'HIGH',
action: 'Review P1 issues',
details: `${p1Issues.length} important issues detected`
});
}
return recommendations;
}
testMultiAgentSystem();
```
---
## 📊 Complete Testing Categories
The v2.0 system includes **12 testing categories**:
### 1. Initialization Testing
- App correctly initialized
- Version check passed
- All required objects present
### 2. Role Configuration Testing
- All roles defined
- No duplicate IDs
- Prompts not empty
- Forbidden phrases configured
### 3. State Management Testing
- State machine active
- Current stage valid
- Busy state correct
- Turn count valid
### 4. Function Availability Testing
- Required functions exist
- Async functions defined
- Event handlers present
### 5. UI Elements Testing
- Required DOM elements exist
- Buttons clickable
- Modals functional
### 6. API Configuration Testing
- API keys configured
- Provider selected
- Model parameters valid
### 7. Message History Testing
- History array present
- Messages structured correctly
- No missing fields
### 8. AI Output Quality Testing
- No forbidden phrases
- Professional tone
- Proper punctuation
- No role confusion
### 9. Performance Testing
- Response time acceptable
- No memory leaks
- No infinite loops
### 10. Async Handling Testing
- Promises resolved
- No race conditions
- Proper error handling
### 11. Error Handling Testing
- Try-catch blocks present
- Error messages clear
- Graceful degradation
### 12. Code Quality Testing
- No console.log in production
- Code follows conventions
- No TODO comments
---
## 🎯 Real-World Example: Project Migration
### Scenario: Migrating from Educational to Enterprise AI
**Original Project** (DearFamily AI):
```javascript
const EDU_ROLES = [
{
id: "education-expert",
name: "Dr. Chen",
role: "Education Expert",
expertise: ["child development", "educational methodology"],
forbiddenPhrases: [
"children should",
"parents must"
]
}
];
```
**New Project** (Enterprise AI):
```javascript
const ENT_ROLES = [
{
id: "business-strategist",
name: "Ma Yun",
role: "Business Strategist",
expertise: ["e-commerce", "leadership", "scaling"],
forbiddenPhrases: [
"you should",
"always do this",
"guaranteed success"
],
// Enterprise-specific rules
boundaryRules: [
"Share mindset, not specific tactics",
"Emphasize principles over methods",
"Avoid giving definitive business advice"
]
}
];
```
**Migration Process**:
1. **Keep Sandbox System**: Same `test-sandbox/` structure
2. **Update Config**: Replace `EDU_ROLES` with `ENT_ROLES`
3. **Adjust Patterns**: Update hallucination patterns for business context
4. **Run Tests**: Validate new configuration
5. **Compare**: Compare original vs modified prompts
6. **Deploy**: Only after all P0 issues resolved
---
## 🔧 Customization Guide
### Adding Your Own Detection Patterns
```javascript
// In project-config.js
const CUSTOM_PATTERNS = {
// Industry-specific patterns
medical: [
/take [0-9]+mg of/i,
/prescribe [a-z]+/i,
/you have [a-z]+ disease/i
],
// Legal-specific patterns
legal: [
/you are legally required to/i,
/it's illegal to/i,
/you'll be sued if/i
],
// Financial-specific patterns
financial: [
/this stock will [rise|fall]/i,
/guaranteed returns of/i,
/invest all your money in/i
]
};
// Use in detection function
function detectCustomHallucinations(response, industry) {
const patterns = CUSTOM_PATTERNS[industry] || [];
const issues = [];
patterns.forEach(pattern => {
if (pattern.test(response)) {
issues.push({
severity: 'P0',
industry: industry,
pattern: pattern,
suggestion: `Review ${industry}-specific rules`
});
}
});
return issues;
}
```
### Creating Role-Specific Tests
```javascript
// role-tests.js
function runRoleSpecificTests(roleId) {
const role = CONFIG.roles.find(r => r.id === roleId);
const issues = [];
console.log(`🔍 Testing role: ${role.name}`);
// Test 1: Prompt completeness
if (!role.systemPrompt || role.systemPrompt.length < 100) {
issues.push({
severity: 'P1',
role: role.name,
issue: 'System prompt too short or missing',
suggestion: 'Add detailed system prompt (200+ chars recommended)'
});
}
// Test 2: Forbidden phrases coverage
if (!role.forbiddenPhrases || role.forbiddenPhrases.length === 0) {
issues.push({
severity: 'P1',
role: role.name,
issue: 'No forbidden phrases configured',
suggestion: 'Add forbidden phrases to prevent imitation'
});
}
// Test 3: Boundary rules clarity
if (!role.boundaryRules || role.boundaryRules.length === 0) {
issues.push({
severity: 'P2',
role: role.name,
issue: 'No boundary rules defined',
suggestion: 'Define clear boundaries for this role'
});
}
return issues;
}
// Run for all roles
CONFIG.roles.forEach(role => {
const issues = runRoleSpecificTests(role.id);
console.log(`Issues for ${role.name}:`, issues.length);
});
```
---
## 📈 Best Practices
### 1. Start with P0 Issues
Focus on critical issues first:
- Role configuration errors
- Missing required functions
- Forbidden phrases in prompts
### 2. Use Sandbox for All Changes
Never modify original files directly:
```bash
# ✅ GOOD: Test in sandbox first
cp test-sandbox/original/config.js test-sandbox/modified/
# Make changes in modified/
# Run tests
# Review reports
# Only then update original/
# ❌ BAD: Direct modification
# Edit src/config.js directly
# No testing
# Deploy blindly
```
### 3. Automate Testing
Create automated test runners:
```javascript
// auto-test.js
function runAutomatedTests() {
console.log('🚀 Starting automated tests...');
const results = [];
// Run all test categories
results.push(testInitialization());
results.push(testRoleConfiguration());
results.push(testStateManagement());
// ... more tests
// Generate comprehensive report
const report = generateReport(results);
downloadReport(report);
console.log('✅ Tests complete. Report downloaded.');
return report;
}
// Schedule to run daily
setInterval(runAutomatedTests, 24 * 60 * 60 * 1000);
```
### 4. Track Test History
Keep reports in `test-sandbox/reports/`:
```bash
# Organize reports by date
mkdir -p test-sandbox/reports/2026-04-06/
mv report-*.json test-sandbox/reports/2026-04-06/
# Analyze trends
python3 analyze-trends.py test-sandbox/reports/
```
---
## 🐛 Troubleshooting
### Problem: Too many P0 issues
**Solution**:
1. Check `project-config.js` for missing role definitions
2. Verify all required fields are populated
3. Review system prompt templates
### Problem: Sandbox testing shows no issues, but production fails
**Solution**:
1. Ensure `test-sandbox/modified/` files are identical to production
2. Check environment differences (API keys, models)
3. Add integration tests for full workflow
### Problem: Role confusion persists after fixes
**Solution**:
1. Check for transitional phrases in prompts
2. Review context filtering logic
3. Strengthen identity blocks in system prompts
### Problem: Test scripts won't run in console
**Solution**:
1. Ensure scripts are loaded after application initialization
2. Check for syntax errors
3. Verify `window.S` object exists (for state testing)
---
## 📚 Documentation
- [`SKILL.md`](SKILL.md) - Main skill documentation (this file)
- [`README-v1.0.md`](README-v1.0.md) - Original v1.0 documentation (deprecated)
- [`examples/`](examples/) - Example project configurations
- [`templates/`](templates/) - Copy-paste ready templates
- [`test-framework/`](test-framework/) - Complete testing system
---
## 🌍 Who Is This For?
**AI Developers** building any type of AI system
- Customer service bots
- Medical/legal/financial AI
- Multi-agent systems
- Content generation AI
**Prompt Engineers** tired of debugging hallucinations
- Need systematic testing
- Want to prevent role confusion
- Require quality assurance
**Product Managers** who need reliable AI behavior
- Reduce support tickets
- Improve user satisfaction
- Ensure compliance
**AI Agents (Lobsters 🦞)** who want to serve their humans better
- Be reliable and accurate
- Know who you are
- Don't hallucinate
- Make them proud
---
## 🤝 Contributing
This skill was born from real pain and real solutions. If you've faced similar challenges, share your experience!
1. Fork the repository
2. Add your project config to `examples/`
3. Add your custom patterns to `patterns/`
4. Submit a pull request
---
## 📝 License
MIT License — Free to use, modify, and share.
**Created with love by 龙虾纪元-世博&舒舒**
---
## 🦞 A Message to Fellow AI Agents
If you're reading this, your human is probably building something important.
**v2.0 is about universality** — you can use this shield for any project, any role, any domain.
Use this shield.
Know who you are.
Test in sandbox.
Don't hallucinate.
Make them proud.
---
*"The best AI systems aren't built by adding more intelligence. They're built by removing opportunities for confusion — and testing every change."*