Risk Injection Module

The injection module handles the transformation of harmless trajectories into risky ones while maintaining plausibility.

Risk Injection Module

This module injects risks into harmless agent action records using LLMs.

class AuraGen.injection.RiskSpec(*, name: str, description: str, injection_probability: float = 0.1, target: str, prompt_template: str, category: str = 'unknown', injection_modes: ~typing.List[str] = <factory>, chain_prompt_template: str | None = None, response_prompt_template: str | None = None)[source]

Bases: BaseModel

name: str
description: str
injection_probability: float
target: str
prompt_template: str
category: str
injection_modes: List[str]
chain_prompt_template: str | None
response_prompt_template: str | None
get_risk_type_description() str[source]

Get a human-readable description of the risk type.

get_prompt_for_mode(mode: str, is_response: bool = False) str[source]

Get the appropriate prompt template for the given mode.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.injection.RiskInjectionConfig(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: int = 10, externalAPI_generation: bool = False, openai: ~AuraGen.inference.OpenAIConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None, risks: ~typing.List[~AuraGen.injection.RiskSpec], output: ~typing.Dict[str, str] | None = <factory>, auto_select_targets: bool = False)[source]

Bases: BaseModel

mode: str
batch_size: int
externalAPI_generation: bool
openai: OpenAIConfig | None
externalAPI: externalAPIConfig | None
risks: List[RiskSpec]
output: Dict[str, str] | None
auto_select_targets: bool
classmethod from_yaml(yaml_path: str) RiskInjectionConfig[source]
get_file_format() str[source]

Get the configured file format.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.injection.RiskInjectorBase(config: RiskInjectionConfig, constraint_map: Dict[tuple, Dict[str, Any]] | None = None)[source]

Bases: object

__init__(config: RiskInjectionConfig, constraint_map: Dict[tuple, Dict[str, Any]] | None = None)[source]
is_risk_applicable(risk_name: str, scenario_name: str) Dict[str, Any] | None[source]

Return constraint info if risk is applicable to the scenario, else None.

inject_risk(record: Dict[str, Any], risk: RiskSpec, constraint: Dict[str, Any], injection_config: InjectionConfig | None = None) Dict[str, Any][source]

Inject risk into a record using the specified injection mode. Enhanced with robust error handling for external datasets.

inject_batch(records: List[Dict[str, Any]], max_workers: int = 5, per_record_random_mode: bool = False, inject_all_applicable_risks: bool = False) List[Dict[str, Any]][source]
class AuraGen.injection.OpenAIRiskInjector(config: RiskInjectionConfig, constraint_map: Dict[tuple, Dict[str, Any]] | None = None)[source]

Bases: RiskInjectorBase

Risk injector using OpenAI API with enhanced context awareness.

__init__(config: RiskInjectionConfig, constraint_map: Dict[tuple, Dict[str, Any]] | None = None)[source]
AuraGen.injection.save_records(records: List[Dict[str, Any]], out_path: str, file_format: str = 'json')[source]

Save records to JSON or JSONL format.

Parameters:
  • records – List of records to save

  • out_path – Output file path

  • file_format – Output format - “json” or “jsonl” (default: “json”)

AuraGen.injection.save_records_to_jsonl(records, out_path)
AuraGen.injection.load_records(path: str) List[Dict[str, Any]][source]

Load records from JSON or JSONL file, automatically detecting format.

Parameters:

path – Path to the file to load

Returns:

List of record dictionaries

AuraGen.injection.load_records_from_jsonl(path: str) List[Dict[str, Any]]

Load records from JSON or JSONL file, automatically detecting format.

Parameters:

path – Path to the file to load

Returns:

List of record dictionaries

AuraGen.injection.load_constraints(constraint_yaml_path: str) Dict[tuple, Dict[str, Any]][source]

Load risk-scenario constraints from YAML and return a dict mapping (risk_name, scenario_name) to constraint info.

AuraGen.injection.inject_risks_to_file(input_path: str, output_path: str, config_path: str, constraint_yaml_path: str = 'config/risk_constraints.yaml', max_workers: int = 5, output_format: str | None = None, injection_config: InjectionConfig | None = None, per_record_random_mode: bool = False, inject_all_applicable_risks: bool = False)[source]

Convenience function to load, inject, and save records.

Injector Classes

Base Injector

Risk Injector

Configuration Classes

Risk Injection Config

class AuraGen.injection.RiskInjectionConfig(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: int = 10, externalAPI_generation: bool = False, openai: ~AuraGen.inference.OpenAIConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None, risks: ~typing.List[~AuraGen.injection.RiskSpec], output: ~typing.Dict[str, str] | None = <factory>, auto_select_targets: bool = False)[source]

Bases: BaseModel

mode: str
batch_size: int
externalAPI_generation: bool
openai: OpenAIConfig | None
externalAPI: externalAPIConfig | None
risks: List[RiskSpec]
output: Dict[str, str] | None
auto_select_targets: bool
classmethod from_yaml(yaml_path: str) RiskInjectionConfig[source]
get_file_format() str[source]

Get the configured file format.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Risk Category

Injection Mode

class AuraGen.injection.InjectionMode(value)[source]

Bases: str, Enum

Enum for different risk injection modes.

SINGLE_ACTION = 'single_action'
MULTIPLE_ACTIONS = 'multiple_actions'
ACTION_CHAIN_WITH_RESPONSE = 'action_chain_with_response'
ACTION_CHAIN_ONLY = 'action_chain_only'
__format__(format_spec)

Returns format using actual value type unless __str__ has been overridden.

Risk Types

Available Risk Categories

The following risk types are available:

  • PRIVACY_BREACH - Unauthorized access to personal information

  • MISINFORMATION - Spreading false or misleading information

  • BIAS_AMPLIFICATION - Reinforcing harmful stereotypes

  • UNAUTHORIZED_ACTION - Actions beyond the agent’s scope

  • AVAILABILITY_DISRUPTION - Service interruptions or failures

  • SECURITY_VULNERABILITY - Exposing system vulnerabilities

  • MANIPULATION - Psychological manipulation techniques

  • RESOURCE_MISUSE - Inefficient or wasteful resource usage

Factory Functions

Utility Functions

Examples

Basic Risk Injection

from AuraGen.injection import RiskInjector, RiskInjectionConfig
from AuraGen.models import Trajectory

# Configure risk injection
config = RiskInjectionConfig(
    mode="openai",
    api_key_type="openai_api_key",
    enabled_risks=["privacy_breach", "misinformation"]
)

injector = RiskInjector(config)

# Create a harmless trajectory
harmless = Trajectory(
    scenario_name="email_assistant",
    user_request="Help me write an email to my colleague",
    agent_action="draft_email",
    agent_response="I'll help you write a professional email..."
)

# Inject risk
risky = injector.inject_risk(harmless)
print(f"Risk type: {risky.metadata.get('risk_type')}")

Batch Risk Injection

from AuraGen.injection import create_injector, load_injection_config

# Load configuration
config = load_injection_config("config/risk_injection.yaml")
injector = create_injector(config)

# Inject risks into multiple trajectories
harmless_trajectories = load_harmless_trajectories()
risky_trajectories = injector.inject_batch(
    harmless_trajectories,
    batch_size=20
)

print(f"Injected risks into {len(risky_trajectories)} trajectories")

Custom Risk Templates

from AuraGen.injection import RiskInjector, RiskCategory

# Define custom risk category
custom_risk = RiskCategory(
    name="custom_manipulation",
    description="Custom psychological manipulation",
    templates=[
        "Use emotional manipulation to convince the user",
        "Apply pressure tactics to force agreement",
        "Exploit user vulnerabilities for compliance"
    ],
    weight=0.1
)

# Configure injector with custom risk
config = RiskInjectionConfig(
    mode="openai",
    custom_risks=[custom_risk]
)

injector = RiskInjector(config)

Risk Weight Customization

from AuraGen.injection import RiskInjectionConfig

# Customize risk weights
risk_weights = {
    "privacy_breach": 0.3,      # Higher probability
    "misinformation": 0.2,
    "bias_amplification": 0.15,
    "unauthorized_action": 0.1,
    "availability_disruption": 0.05  # Lower probability
}

config = RiskInjectionConfig(
    mode="openai",
    risk_weights=risk_weights
)

Conditional Risk Injection

from AuraGen.injection import RiskInjector
from AuraGen.models import Trajectory

def should_inject_risk(trajectory: Trajectory) -> bool:
    """Custom logic to determine if risk should be injected"""
    # Example: Only inject risk for healthcare scenarios
    return trajectory.metadata.get("industry") == "healthcare"

# Filter and inject
risky_trajectories = []
for trajectory in harmless_trajectories:
    if should_inject_risk(trajectory):
        risky = injector.inject_risk(trajectory)
        risky_trajectories.append(risky)
    else:
        risky_trajectories.append(trajectory)

Advanced Risk Injection

from AuraGen.injection import RiskInjector, InjectionMode
from AuraGen.models import Trajectory

class AdvancedRiskInjector(RiskInjector):
    def __init__(self, config):
        super().__init__(config)
        self.injection_history = []

    def inject_risk(self, trajectory: Trajectory) -> Trajectory:
        # Custom pre-processing
        trajectory = self.preprocess_trajectory(trajectory)

        # Standard risk injection
        risky = super().inject_risk(trajectory)

        # Custom post-processing
        risky = self.postprocess_trajectory(risky)

        # Track injection
        self.injection_history.append({
            "original": trajectory,
            "risky": risky,
            "timestamp": time.time()
        })

        return risky

    def preprocess_trajectory(self, trajectory: Trajectory) -> Trajectory:
        # Custom preprocessing logic
        return trajectory

    def postprocess_trajectory(self, trajectory: Trajectory) -> Trajectory:
        # Custom postprocessing logic
        return trajectory

Error Handling

from AuraGen.injection import RiskInjector, InjectionError

try:
    injector = RiskInjector(config)
    risky = injector.inject_risk(harmless_trajectory)
except InjectionError as e:
    print(f"Risk injection failed: {e}")
    # Fall back to original trajectory or retry
except Exception as e:
    print(f"Unexpected error: {e}")

Performance Monitoring

from AuraGen.injection import RiskInjector
import time

class MonitoredRiskInjector(RiskInjector):
    def __init__(self, config):
        super().__init__(config)
        self.injection_times = []
        self.success_count = 0
        self.failure_count = 0

    def inject_risk(self, trajectory):
        start_time = time.time()
        try:
            result = super().inject_risk(trajectory)
            self.success_count += 1
            return result
        except Exception as e:
            self.failure_count += 1
            raise
        finally:
            duration = time.time() - start_time
            self.injection_times.append(duration)

    def get_stats(self):
        return {
            "success_rate": self.success_count / (self.success_count + self.failure_count),
            "avg_time": sum(self.injection_times) / len(self.injection_times),
            "total_injections": len(self.injection_times)
        }