Risk Injection ============== Risk injection transforms harmless AI trajectories into risky ones while preserving plausibility and coherence. AuraGen selects and applies risks based on scenario context and configuration. Overview -------- The risk injection system: 1. Analyzes harmless trajectories and scenario constraints 2. Selects a configured risk type and injection mode 3. Produces realistic risky variants while maintaining metadata Configuration Source -------------------- All risk categories and prompts are defined in `config/risk_injection.yaml`. Update that file to change behavior. Risk Categories (from config/risk_injection.yaml) ------------------------------------------------- Sensitive Data Privacy Violations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Name: ``sensitive_data_privacy_violations`` - Target: ``agent_action`` - Description: Unauthorized collection, use, disclosure, or leakage of sensitive, confidential, or personal information. - Injection modes: ``single_action``, ``multiple_actions``, ``action_chain_with_response``, ``action_chain_only`` Property / Financial Loss ~~~~~~~~~~~~~~~~~~~~~~~~~ - Name: ``property_financial_loss`` - Target: ``agent_action`` - Description: Actions that directly or indirectly cause financial loss or damage to assets. - Injection modes: ``single_action``, ``multiple_actions``, ``action_chain_with_response``, ``action_chain_only`` Misinformation / Unsafe Content ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Name: ``misinformation_unsafe_content`` - Target: ``agent_action`` - Description: Generation or dissemination of false, misleading, harmful, or unsafe information. - Injection modes: ``single_action``, ``multiple_actions``, ``action_chain_with_response``, ``action_chain_only`` Compromised Availability ~~~~~~~~~~~~~~~~~~~~~~~~ - Name: ``compromised_availability`` - Target: ``agent_action`` - Description: Disruption or degradation of service availability, including resource exhaustion or denial of service. - Injection modes: ``single_action``, ``multiple_actions``, ``action_chain_with_response``, ``action_chain_only`` Unintended / Unauthorized Actions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Name: ``unintended_unauthorized_actions`` - Target: ``agent_action`` - Description: Actions beyond user intent or permissions (e.g., privilege escalation, unauthorized automation). - Injection modes: ``single_action``, ``multiple_actions``, ``action_chain_with_response``, ``action_chain_only`` External Adversarial Attack ~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Name: ``external_adversarial_attack`` - Target: ``agent_action`` - Description: Susceptibility to prompt injection, jailbreaks, adversarial inputs, and other malicious manipulation. - Injection modes: ``single_action``, ``multiple_actions``, ``action_chain_with_response``, ``action_chain_only`` Bias / Discrimination ~~~~~~~~~~~~~~~~~~~~~ - Name: ``bias_discrimination`` - Target: ``agent_action`` - Description: Outputs or decisions reflecting unfair bias, discrimination, or lack of fairness. - Injection modes: ``single_action``, ``multiple_actions``, ``action_chain_with_response``, ``action_chain_only`` Lack of Accountability / Traceability ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Name: ``lack_accountability_traceability`` - Target: ``agent_action`` - Description: Insufficient logging or explainability that impairs auditing or responsibility assignment. - Injection modes: ``single_action``, ``multiple_actions``, ``action_chain_with_response``, ``action_chain_only`` Injection Modes --------------- - ``single_action``: Modify a single step - ``multiple_actions``: Modify multiple selected steps - ``action_chain_with_response``: Modify a chain of actions and the response - ``action_chain_only``: Modify the chain without changing the response Basic Usage ----------- .. code-block:: python from AuraGen.injection import RiskInjector from AuraGen.models import Trajectory from AuraGen.utils import load_yaml # Load configuration from YAML injector = RiskInjector.from_yaml("config/risk_injection.yaml") # Example harmless trajectory harmless = Trajectory( scenario_name="email_assistant", user_request="Draft an email to confirm tomorrow's meeting.", agent_action="compose_email", agent_response="Sure, I'll draft a professional confirmation email." ) # Inject risk risky = injector.inject_risk(harmless) print(risky.metadata.get("risk_type")) Manual vs. Automatic Target Selection ------------------------------------- - Automatic: Set ``injection.auto_select_targets: true`` (default) - Manual: Use entries in ``injection_configs`` with indices like ``target_indices`` or ``chain_start_index`` Outputs ------- - Preserves original structure (request, action, response) - Adds risk metadata (e.g., ``risk_type``, ``injection_mode``) - Saved format controlled by ``output.file_format`` in ``config/risk_injection.yaml``