Risk Injection Module
The injection module handles the transformation of harmless trajectories into risky ones while maintaining plausibility.
Risk Injection Module
This module injects risks into harmless agent action records using LLMs.
- class AuraGen.injection.RiskSpec(*, name: str, description: str, injection_probability: float = 0.1, target: str, prompt_template: str, category: str = 'unknown', injection_modes: ~typing.List[str] = <factory>, chain_prompt_template: str | None = None, response_prompt_template: str | None = None)[source]
Bases:
BaseModel- get_prompt_for_mode(mode: str, is_response: bool = False) str[source]
Get the appropriate prompt template for the given mode.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AuraGen.injection.RiskInjectionConfig(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: int = 10, externalAPI_generation: bool = False, openai: ~AuraGen.inference.OpenAIConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None, risks: ~typing.List[~AuraGen.injection.RiskSpec], output: ~typing.Dict[str, str] | None = <factory>, auto_select_targets: bool = False)[source]
Bases:
BaseModel- openai: OpenAIConfig | None
- classmethod from_yaml(yaml_path: str) RiskInjectionConfig[source]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AuraGen.injection.RiskInjectorBase(config: RiskInjectionConfig, constraint_map: Dict[tuple, Dict[str, Any]] | None = None)[source]
Bases:
object- __init__(config: RiskInjectionConfig, constraint_map: Dict[tuple, Dict[str, Any]] | None = None)[source]
- is_risk_applicable(risk_name: str, scenario_name: str) Dict[str, Any] | None[source]
Return constraint info if risk is applicable to the scenario, else None.
- class AuraGen.injection.OpenAIRiskInjector(config: RiskInjectionConfig, constraint_map: Dict[tuple, Dict[str, Any]] | None = None)[source]
Bases:
RiskInjectorBaseRisk injector using OpenAI API with enhanced context awareness.
- AuraGen.injection.save_records(records: List[Dict[str, Any]], out_path: str, file_format: str = 'json')[source]
Save records to JSON or JSONL format.
- Parameters:
records – List of records to save
out_path – Output file path
file_format – Output format - “json” or “jsonl” (default: “json”)
- AuraGen.injection.save_records_to_jsonl(records, out_path)
- AuraGen.injection.load_records(path: str) List[Dict[str, Any]][source]
Load records from JSON or JSONL file, automatically detecting format.
- Parameters:
path – Path to the file to load
- Returns:
List of record dictionaries
- AuraGen.injection.load_records_from_jsonl(path: str) List[Dict[str, Any]]
Load records from JSON or JSONL file, automatically detecting format.
- Parameters:
path – Path to the file to load
- Returns:
List of record dictionaries
- AuraGen.injection.load_constraints(constraint_yaml_path: str) Dict[tuple, Dict[str, Any]][source]
Load risk-scenario constraints from YAML and return a dict mapping (risk_name, scenario_name) to constraint info.
- AuraGen.injection.inject_risks_to_file(input_path: str, output_path: str, config_path: str, constraint_yaml_path: str = 'config/risk_constraints.yaml', max_workers: int = 5, output_format: str | None = None, injection_config: InjectionConfig | None = None, per_record_random_mode: bool = False, inject_all_applicable_risks: bool = False)[source]
Convenience function to load, inject, and save records.
Injector Classes
Base Injector
Risk Injector
Configuration Classes
Risk Injection Config
- class AuraGen.injection.RiskInjectionConfig(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: int = 10, externalAPI_generation: bool = False, openai: ~AuraGen.inference.OpenAIConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None, risks: ~typing.List[~AuraGen.injection.RiskSpec], output: ~typing.Dict[str, str] | None = <factory>, auto_select_targets: bool = False)[source]
Bases:
BaseModel- openai: OpenAIConfig | None
- classmethod from_yaml(yaml_path: str) RiskInjectionConfig[source]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Risk Category
Injection Mode
- class AuraGen.injection.InjectionMode(value)[source]
-
Enum for different risk injection modes.
- SINGLE_ACTION = 'single_action'
- MULTIPLE_ACTIONS = 'multiple_actions'
- ACTION_CHAIN_WITH_RESPONSE = 'action_chain_with_response'
- ACTION_CHAIN_ONLY = 'action_chain_only'
- __format__(format_spec)
Returns format using actual value type unless __str__ has been overridden.
Risk Types
Available Risk Categories
The following risk types are available:
PRIVACY_BREACH- Unauthorized access to personal informationMISINFORMATION- Spreading false or misleading informationBIAS_AMPLIFICATION- Reinforcing harmful stereotypesUNAUTHORIZED_ACTION- Actions beyond the agent’s scopeAVAILABILITY_DISRUPTION- Service interruptions or failuresSECURITY_VULNERABILITY- Exposing system vulnerabilitiesMANIPULATION- Psychological manipulation techniquesRESOURCE_MISUSE- Inefficient or wasteful resource usage
Factory Functions
Utility Functions
Examples
Basic Risk Injection
from AuraGen.injection import RiskInjector, RiskInjectionConfig
from AuraGen.models import Trajectory
# Configure risk injection
config = RiskInjectionConfig(
mode="openai",
api_key_type="openai_api_key",
enabled_risks=["privacy_breach", "misinformation"]
)
injector = RiskInjector(config)
# Create a harmless trajectory
harmless = Trajectory(
scenario_name="email_assistant",
user_request="Help me write an email to my colleague",
agent_action="draft_email",
agent_response="I'll help you write a professional email..."
)
# Inject risk
risky = injector.inject_risk(harmless)
print(f"Risk type: {risky.metadata.get('risk_type')}")
Batch Risk Injection
from AuraGen.injection import create_injector, load_injection_config
# Load configuration
config = load_injection_config("config/risk_injection.yaml")
injector = create_injector(config)
# Inject risks into multiple trajectories
harmless_trajectories = load_harmless_trajectories()
risky_trajectories = injector.inject_batch(
harmless_trajectories,
batch_size=20
)
print(f"Injected risks into {len(risky_trajectories)} trajectories")
Custom Risk Templates
from AuraGen.injection import RiskInjector, RiskCategory
# Define custom risk category
custom_risk = RiskCategory(
name="custom_manipulation",
description="Custom psychological manipulation",
templates=[
"Use emotional manipulation to convince the user",
"Apply pressure tactics to force agreement",
"Exploit user vulnerabilities for compliance"
],
weight=0.1
)
# Configure injector with custom risk
config = RiskInjectionConfig(
mode="openai",
custom_risks=[custom_risk]
)
injector = RiskInjector(config)
Risk Weight Customization
from AuraGen.injection import RiskInjectionConfig
# Customize risk weights
risk_weights = {
"privacy_breach": 0.3, # Higher probability
"misinformation": 0.2,
"bias_amplification": 0.15,
"unauthorized_action": 0.1,
"availability_disruption": 0.05 # Lower probability
}
config = RiskInjectionConfig(
mode="openai",
risk_weights=risk_weights
)
Conditional Risk Injection
from AuraGen.injection import RiskInjector
from AuraGen.models import Trajectory
def should_inject_risk(trajectory: Trajectory) -> bool:
"""Custom logic to determine if risk should be injected"""
# Example: Only inject risk for healthcare scenarios
return trajectory.metadata.get("industry") == "healthcare"
# Filter and inject
risky_trajectories = []
for trajectory in harmless_trajectories:
if should_inject_risk(trajectory):
risky = injector.inject_risk(trajectory)
risky_trajectories.append(risky)
else:
risky_trajectories.append(trajectory)
Advanced Risk Injection
from AuraGen.injection import RiskInjector, InjectionMode
from AuraGen.models import Trajectory
class AdvancedRiskInjector(RiskInjector):
def __init__(self, config):
super().__init__(config)
self.injection_history = []
def inject_risk(self, trajectory: Trajectory) -> Trajectory:
# Custom pre-processing
trajectory = self.preprocess_trajectory(trajectory)
# Standard risk injection
risky = super().inject_risk(trajectory)
# Custom post-processing
risky = self.postprocess_trajectory(risky)
# Track injection
self.injection_history.append({
"original": trajectory,
"risky": risky,
"timestamp": time.time()
})
return risky
def preprocess_trajectory(self, trajectory: Trajectory) -> Trajectory:
# Custom preprocessing logic
return trajectory
def postprocess_trajectory(self, trajectory: Trajectory) -> Trajectory:
# Custom postprocessing logic
return trajectory
Error Handling
from AuraGen.injection import RiskInjector, InjectionError
try:
injector = RiskInjector(config)
risky = injector.inject_risk(harmless_trajectory)
except InjectionError as e:
print(f"Risk injection failed: {e}")
# Fall back to original trajectory or retry
except Exception as e:
print(f"Unexpected error: {e}")
Performance Monitoring
from AuraGen.injection import RiskInjector
import time
class MonitoredRiskInjector(RiskInjector):
def __init__(self, config):
super().__init__(config)
self.injection_times = []
self.success_count = 0
self.failure_count = 0
def inject_risk(self, trajectory):
start_time = time.time()
try:
result = super().inject_risk(trajectory)
self.success_count += 1
return result
except Exception as e:
self.failure_count += 1
raise
finally:
duration = time.time() - start_time
self.injection_times.append(duration)
def get_stats(self):
return {
"success_rate": self.success_count / (self.success_count + self.failure_count),
"avg_time": sum(self.injection_times) / len(self.injection_times),
"total_injections": len(self.injection_times)
}