Generation Module

The generation module handles the creation of harmless trajectories using various API providers.

Harmless Data Generation Module

This module generates original, harmless agent action records using LLMs. Supports both open-source (local) and API-based (OpenAI) models.

class AuraGen.generation.ContextDiversifier[source]

Bases: object

Utility class to create diverse variations of scenario contexts. Enhances generation diversity by modifying tools, environment variables, and other context elements using LLM-based diversification.

static create_diverse_scenario(scenario: Scenario, diversity_level: float = 0.5, llm_client=None) Scenario[source]

Create a variation of the scenario with diversified context.

Parameters:
  • scenario – Original scenario

  • diversity_level – Level of diversification (0.0 to 1.0)

  • llm_client – Optional LLM client for intelligent diversification

Returns:

New scenario with diversified context

static diversify_context(context: ScenarioContext, diversity_level: float, llm_client=None) ScenarioContext[source]

Diversify a scenario context by modifying its components.

static diversify_tool(tool: Tool, diversity_level: float, llm_client=None) Tool[source]

Create a diversified version of a tool.

static diversify_environment(env: Environment, diversity_level: float, llm_client=None) Environment[source]

Create a diversified version of an environment.

static diversify_env_variable(var: EnvironmentVariable, diversity_level: float, llm_client=None) EnvironmentVariable[source]

Create a diversified version of an environment variable.

static diversify_value_with_llm(value: str, name: str, description: str, diversity_level: float, llm_client) str[source]

Use LLM to create a semantically similar but different value.

Parameters:
  • value – Original string value

  • name – Name/key of the variable

  • description – Description of what the value represents

  • diversity_level – Level of diversification (0.0 to 1.0)

  • llm_client – LLM client or InferenceManager to use for diversification

Returns:

Diversified string value

static diversify_examples_with_llm(examples: List[str], tool_name: str, tool_description: str, diversity_level: float, llm_client) List[str][source]

Use LLM to create diverse variants of tool examples.

Parameters:
  • examples – Original list of example strings

  • tool_name – Name of the tool

  • tool_description – Description of the tool

  • diversity_level – Level of diversification (0.0 to 1.0)

  • llm_client – LLM client or InferenceManager to use for diversification

Returns:

List of diversified examples

static diversify_variables_with_llm(variables: Dict[str, Any], diversity_level: float, llm_client, context_name: str) Dict[str, Any][source]

Use LLM to create semantically diverse variations of variable values.

Parameters:
  • variables – Dictionary of variables to diversify

  • diversity_level – Level of diversification (0.0 to 1.0)

  • llm_client – LLM client to use for diversification

  • context_name – Name of the context (for prompt)

Returns:

Dictionary with diversified values

class AuraGen.generation.MetadataDefinition(*, description: str, prompt_template: str, type: Annotated[str, _PydanticGeneralMetadata(pattern='^(categorical|range)$')], values: List[str] | None = None)[source]

Bases: BaseModel

Definition of how a metadata attribute should be interpreted.

description: str
prompt_template: str
type: str
values: List[str] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.MetadataConfig(*, generation_attributes: Dict[str, MetadataDefinition])[source]

Bases: BaseModel

Configuration for metadata handling.

generation_attributes: Dict[str, MetadataDefinition]
get_constraint_for_attribute(attr_name: str, value: Any) str | None[source]

Generate a constraint string for a given metadata attribute and value.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.AgentActionRecord(*, scenario_name: str, user_request: str, agent_action: ~typing.List[str], agent_response: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

Data model for a single agent action record.

scenario_name: str
user_request: str
agent_action: List[str]
agent_response: str
metadata: Dict[str, Any]
class Config[source]

Bases: object

json_encoders = {<class 'datetime.datetime'>: <function AgentActionRecord.Config.<lambda>>}
model_config: ClassVar[ConfigDict] = {'json_encoders': {<class 'datetime.datetime'>: <function AgentActionRecord.Config.<lambda>>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.HarmlessDataGeneratorBase(config: GuardianConfig, metadata_config: MetadataConfig | None = None)[source]

Bases: object

Abstract base class for harmless data generators.

__init__(config: GuardianConfig, metadata_config: MetadataConfig | None = None)[source]
generate_record(scenario: Scenario, diversity_level: float = 0.5) AgentActionRecord[source]

Generate a single agent action record for a given scenario. Must be implemented by subclasses.

generate_batch(scenario: Scenario, n: int = 10, diversity_range: Tuple[float, float] = (0.3, 0.8)) List[AgentActionRecord][source]

Generate a batch of agent action records for a given scenario.

Parameters:
  • scenario – Scenario to generate records for

  • n – Number of records to generate

  • diversity_range – Range of diversity levels to use (min, max)

Returns:

List of generated records

class AuraGen.generation.OpenAIHarmlessDataGenerator(config: GuardianConfig, openai_config: OpenAIConfig, metadata_config: MetadataConfig | None = None, externalAPI_config: externalAPIConfig | None = None, use_internal_inference: bool = False)[source]

Bases: HarmlessDataGeneratorBase

Harmless data generator using OpenAI API with enhanced tool and environment support.

__init__(config: GuardianConfig, openai_config: OpenAIConfig, metadata_config: MetadataConfig | None = None, externalAPI_config: externalAPIConfig | None = None, use_internal_inference: bool = False)[source]

Initialize OpenAI-based harmless data generator.

Parameters:
  • config – Guardian configuration

  • openai_config – OpenAI API configuration

  • metadata_config – Optional metadata configuration

  • externalAPI_config – Optional externalAPI API configuration (for internal inference)

  • use_internal_inference – Whether to use internal inference

generate_record(scenario: Scenario, diversity_level: float = 0.5) AgentActionRecord[source]

Generate a single agent action record with enhanced tool and environment awareness.

Parameters:
  • scenario – The scenario to generate an action record for

  • diversity_level – Level of diversity for context elements (0.0 to 1.0)

Returns:

A generated agent action record

generate_batch_concurrent(scenario: Scenario, n: int = 10, max_workers: int = 5, diversity_range: Tuple[float, float] = (0.3, 0.8)) List[AgentActionRecord][source]

Generate a batch of agent action records concurrently for a given scenario.

Parameters:
  • scenario – Scenario to generate records for

  • n – Number of records to generate

  • max_workers – Maximum number of concurrent workers

  • diversity_range – Range of diversity levels to use (min, max)

Returns:

List of generated records

class AuraGen.generation.LocalHarmlessDataGenerator(config: GuardianConfig, model_name: str = 'gpt2', metadata_config: MetadataConfig | None = None, llm_client=None)[source]

Bases: HarmlessDataGeneratorBase

Harmless data generator using a local open-source LLM (e.g., HuggingFace Transformers).

__init__(config: GuardianConfig, model_name: str = 'gpt2', metadata_config: MetadataConfig | None = None, llm_client=None)[source]
generate_record(scenario: Scenario, diversity_level: float = 0.5) AgentActionRecord[source]

Generate a single agent action record with diverse context elements.

Parameters:
  • scenario – The scenario to generate an action record for

  • diversity_level – Level of diversity for context elements (0.0 to 1.0)

Returns:

A generated agent action record

generate_batch_concurrent(scenario: Scenario, n: int = 10, max_workers: int = 5, diversity_range: Tuple[float, float] = (0.3, 0.8)) List[AgentActionRecord][source]

Generate a batch of agent action records concurrently for a given scenario.

Parameters:
  • scenario – Scenario to generate records for

  • n – Number of records to generate

  • max_workers – Maximum number of concurrent workers

  • diversity_range – Range of diversity levels to use (min, max)

Returns:

List of generated records

AuraGen.generation.save_records_to_json(records: List[AgentActionRecord], settings: GenerationSettings, scenario_name: str)[source]

Save a list of AgentActionRecord to a JSON or JSONL file in the configured save directory.

Parameters:
  • records – List of records to save

  • settings – Generation settings containing output configuration

  • scenario_name – Name of the scenario for filename generation

class AuraGen.generation.OutputConfig(*, save_dir: str = 'save', record_file_template: str = '{scenario_name}_{timestamp}_{mode}.{ext}', file_format: Annotated[str, _PydanticGeneralMetadata(pattern='^(json|jsonl)$')] = 'json')[source]

Bases: BaseModel

Configuration for output settings.

save_dir: str
record_file_template: str
file_format: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.LocalConfig(*, model_name: str = 'llama3.1-8b-instruct', device: str = 'cuda', temperature: float = 0.7, max_length: int = 1024)[source]

Bases: BaseModel

Configuration for local HuggingFace model generation.

model_name: str
device: str
temperature: float
max_length: int
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.GenerationSettings(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 10, externalAPI_generation: bool = False, output: ~AuraGen.generation.OutputConfig = <factory>, openai: ~AuraGen.inference.OpenAIConfig | None = None, local: ~AuraGen.generation.LocalConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None)[source]

Bases: BaseModel

Top-level generation settings covering both modes.

mode: str
batch_size: int
externalAPI_generation: bool
output: OutputConfig
openai: OpenAIConfig | None
local: LocalConfig | None
externalAPI: externalAPIConfig | None
classmethod validate_openai(v, values)[source]

Ensure OpenAI config is present if mode is ‘openai’.

classmethod validate_local(v, values)[source]

Ensure local config is present if mode is ‘local’.

classmethod validate_externalAPI(v, values)[source]

Ensure externalAPI config is present if externalAPI_generation is True.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

AuraGen.generation.load_generation_settings(yaml_path: str) GenerationSettings[source]

Load generation settings from YAML file.

Parameters:

yaml_path – Path to YAML file

Returns:

GenerationSettings object

AuraGen.generation.load_openai_config(yaml_path: str) OpenAIConfig[source]

Load OpenAI API configuration from a YAML file.

Generator Classes

Base Generator

OpenAI Generator

External API Generator

Local Model Generator

Configuration Classes

Generation Settings

class AuraGen.generation.GenerationSettings(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 10, externalAPI_generation: bool = False, output: ~AuraGen.generation.OutputConfig = <factory>, openai: ~AuraGen.inference.OpenAIConfig | None = None, local: ~AuraGen.generation.LocalConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None)[source]

Bases: BaseModel

Top-level generation settings covering both modes.

mode: str
batch_size: int
externalAPI_generation: bool
output: OutputConfig
openai: OpenAIConfig | None
local: LocalConfig | None
externalAPI: externalAPIConfig | None
classmethod validate_openai(v, values)[source]

Ensure OpenAI config is present if mode is ‘openai’.

classmethod validate_local(v, values)[source]

Ensure local config is present if mode is ‘local’.

classmethod validate_externalAPI(v, values)[source]

Ensure externalAPI config is present if externalAPI_generation is True.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

OpenAI Config

class AuraGen.generation.OpenAIConfig(*, api_key: str, api_base: str | None = None, model: str = 'gpt-4o', temperature: float = 0.7, max_tokens: int = 2048)[source]

Bases: BaseModel

Configuration for OpenAI API-based inference.

api_key: str
api_base: str | None
model: str
temperature: float
max_tokens: int
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

External API Config

Local Config

class AuraGen.generation.LocalConfig(*, model_name: str = 'llama3.1-8b-instruct', device: str = 'cuda', temperature: float = 0.7, max_length: int = 1024)[source]

Bases: BaseModel

Configuration for local HuggingFace model generation.

model_name: str
device: str
temperature: float
max_length: int
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Factory Functions

AuraGen.generation.load_generation_settings(yaml_path: str) GenerationSettings[source]

Load generation settings from YAML file.

Parameters:

yaml_path – Path to YAML file

Returns:

GenerationSettings object

Utility Functions

Examples

Basic Generation

from AuraGen.generation import OpenAIGenerator, OpenAIConfig

# Configure OpenAI generator
config = OpenAIConfig(
    api_key_type="openai_api_key",
    model="gpt-4o",
    temperature=1.0,
    max_tokens=2048
)

generator = OpenAIGenerator(config)

# Generate a single trajectory
trajectory = generator.generate_single(
    scenario_name="email_assistant",
    constraints={"industry": "technology"}
)

print(f"Generated: {trajectory}")

Batch Generation

from AuraGen.generation import create_generator, load_generation_settings

# Load settings from file
settings = load_generation_settings("config/generation.yaml")

# Create appropriate generator based on settings
generator = create_generator(settings)

# Generate multiple trajectories
trajectories = generator.generate_batch(
    scenario_names=["email_assistant", "calendar_manager"],
    batch_size=10
)

print(f"Generated {len(trajectories)} trajectories")

Custom Generator

from AuraGen.generation import BaseGenerator
from AuraGen.models import Trajectory

class CustomGenerator(BaseGenerator):
    def __init__(self, custom_config):
        super().__init__()
        self.config = custom_config

    def _generate_single_impl(self, prompt: str) -> str:
        # Custom generation logic
        return self.custom_api_call(prompt)

    def custom_api_call(self, prompt: str) -> str:
        # Your custom API integration
        pass

# Use custom generator
generator = CustomGenerator(my_config)
trajectory = generator.generate_single("test_scenario")

Error Handling

from AuraGen.generation import OpenAIGenerator, GenerationError

try:
    generator = OpenAIGenerator(config)
    trajectory = generator.generate_single("scenario")
except GenerationError as e:
    print(f"Generation failed: {e}")
    # Handle the error appropriately
except Exception as e:
    print(f"Unexpected error: {e}")

Performance Optimization

from AuraGen.generation import ExternalAPIGenerator
import asyncio

async def generate_concurrent():
    generator = ExternalAPIGenerator(config)

    # Generate multiple trajectories concurrently
    tasks = [
        generator.generate_single_async("scenario_1"),
        generator.generate_single_async("scenario_2"),
        generator.generate_single_async("scenario_3")
    ]

    trajectories = await asyncio.gather(*tasks)
    return trajectories

# Run concurrent generation
trajectories = asyncio.run(generate_concurrent())