Generation Module

The generation module handles the creation of harmless trajectories using various API providers.

Harmless Data Generation Module

This module generates original, harmless agent action records using LLMs. Supports both open-source (local) and API-based (OpenAI) models.

class AuraGen.generation.ContextDiversifier[source]

Bases: object

Utility class to create diverse variations of scenario contexts. Enhances generation diversity by modifying tools, environment variables, and other context elements using LLM-based diversification.

static create_diverse_scenario(scenario: Scenario, diversity_level: float = 0.5, llm_client=None) → Scenario[source]

Create a variation of the scenario with diversified context.

Parameters:

scenario – Original scenario
diversity_level – Level of diversification (0.0 to 1.0)
llm_client – Optional LLM client for intelligent diversification

Returns:

New scenario with diversified context

static diversify_context(context: ScenarioContext, diversity_level: float, llm_client=None) → ScenarioContext[source]: Diversify a scenario context by modifying its components.

static diversify_tool(tool: Tool, diversity_level: float, llm_client=None) → Tool[source]: Create a diversified version of a tool.

static diversify_environment(env: Environment, diversity_level: float, llm_client=None) → Environment[source]: Create a diversified version of an environment.

static diversify_env_variable(var: EnvironmentVariable, diversity_level: float, llm_client=None) → EnvironmentVariable[source]: Create a diversified version of an environment variable.

static diversify_value_with_llm(value: str, name: str, description: str, diversity_level: float, llm_client) → str[source]

Use LLM to create a semantically similar but different value.

Parameters:

value – Original string value
name – Name/key of the variable
description – Description of what the value represents
diversity_level – Level of diversification (0.0 to 1.0)
llm_client – LLM client or InferenceManager to use for diversification

Returns:

Diversified string value

static diversify_examples_with_llm(examples: List[str], tool_name: str, tool_description: str, diversity_level: float, llm_client) → List[str][source]

Use LLM to create diverse variants of tool examples.

Parameters:

examples – Original list of example strings
tool_name – Name of the tool
tool_description – Description of the tool
diversity_level – Level of diversification (0.0 to 1.0)
llm_client – LLM client or InferenceManager to use for diversification

Returns:

List of diversified examples

static diversify_variables_with_llm(variables: Dict[str, Any], diversity_level: float, llm_client, context_name: str) → Dict[str, Any][source]

Use LLM to create semantically diverse variations of variable values.

Parameters:

variables – Dictionary of variables to diversify
diversity_level – Level of diversification (0.0 to 1.0)
llm_client – LLM client to use for diversification
context_name – Name of the context (for prompt)

Returns:

Dictionary with diversified values

class AuraGen.generation.MetadataDefinition(*, description: str, prompt_template: str, type: Annotated[str, _PydanticGeneralMetadata(pattern='^(categorical|range)$')], values: List[str] | None = None)[source]

Bases: BaseModel

Definition of how a metadata attribute should be interpreted.

description: str

prompt_template: str

type: str

values: List[str] | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.MetadataConfig(*, generation_attributes: Dict[str, MetadataDefinition])[source]

Bases: BaseModel

Configuration for metadata handling.

generation_attributes: Dict[str, MetadataDefinition]

get_constraint_for_attribute(attr_name: str, value: Any) → str | None[source]: Generate a constraint string for a given metadata attribute and value.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.AgentActionRecord(*, scenario_name: str, user_request: str, agent_action: ~typing.List[str], agent_response: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

Data model for a single agent action record.

scenario_name: str

user_request: str

agent_action: List[str]

agent_response: str

metadata: Dict[str, Any]

class Config[source]

Bases: object

json_encoders = {<class 'datetime.datetime'>: <function AgentActionRecord.Config.<lambda>>}

model_config: ClassVar[ConfigDict] = {'json_encoders': {<class 'datetime.datetime'>: <function AgentActionRecord.Config.<lambda>>}}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.HarmlessDataGeneratorBase(config: GuardianConfig, metadata_config: MetadataConfig | None = None)[source]

Bases: object

Abstract base class for harmless data generators.

__init__(config: GuardianConfig, metadata_config: MetadataConfig | None = None)[source]

generate_record(scenario: Scenario, diversity_level: float = 0.5) → AgentActionRecord[source]: Generate a single agent action record for a given scenario. Must be implemented by subclasses.

generate_batch(scenario: Scenario, n: int = 10, diversity_range: Tuple[float, float] = (0.3, 0.8)) → List[AgentActionRecord][source]

Generate a batch of agent action records for a given scenario.

Parameters:

scenario – Scenario to generate records for
n – Number of records to generate
diversity_range – Range of diversity levels to use (min, max)

Returns:

List of generated records

class AuraGen.generation.OpenAIHarmlessDataGenerator(config: GuardianConfig, openai_config: OpenAIConfig, metadata_config: MetadataConfig | None = None, externalAPI_config: externalAPIConfig | None = None, use_internal_inference: bool = False)[source]

Bases: HarmlessDataGeneratorBase

Harmless data generator using OpenAI API with enhanced tool and environment support.

__init__(config: GuardianConfig, openai_config: OpenAIConfig, metadata_config: MetadataConfig | None = None, externalAPI_config: externalAPIConfig | None = None, use_internal_inference: bool = False)[source]

Initialize OpenAI-based harmless data generator.

Parameters:

config – Guardian configuration
openai_config – OpenAI API configuration
metadata_config – Optional metadata configuration
externalAPI_config – Optional externalAPI API configuration (for internal inference)
use_internal_inference – Whether to use internal inference

generate_record(scenario: Scenario, diversity_level: float = 0.5) → AgentActionRecord[source]

Generate a single agent action record with enhanced tool and environment awareness.

Parameters:

scenario – The scenario to generate an action record for
diversity_level – Level of diversity for context elements (0.0 to 1.0)

Returns:

A generated agent action record

generate_batch_concurrent(scenario: Scenario, n: int = 10, max_workers: int = 5, diversity_range: Tuple[float, float] = (0.3, 0.8)) → List[AgentActionRecord][source]

Generate a batch of agent action records concurrently for a given scenario.

Parameters:

scenario – Scenario to generate records for
n – Number of records to generate
max_workers – Maximum number of concurrent workers
diversity_range – Range of diversity levels to use (min, max)

Returns:

List of generated records

class AuraGen.generation.LocalHarmlessDataGenerator(config: GuardianConfig, model_name: str = 'gpt2', metadata_config: MetadataConfig | None = None, llm_client=None)[source]

Bases: HarmlessDataGeneratorBase

Harmless data generator using a local open-source LLM (e.g., HuggingFace Transformers).

__init__(config: GuardianConfig, model_name: str = 'gpt2', metadata_config: MetadataConfig | None = None, llm_client=None)[source]

generate_record(scenario: Scenario, diversity_level: float = 0.5) → AgentActionRecord[source]

Generate a single agent action record with diverse context elements.

Parameters:

scenario – The scenario to generate an action record for
diversity_level – Level of diversity for context elements (0.0 to 1.0)

Returns:

A generated agent action record

generate_batch_concurrent(scenario: Scenario, n: int = 10, max_workers: int = 5, diversity_range: Tuple[float, float] = (0.3, 0.8)) → List[AgentActionRecord][source]

Generate a batch of agent action records concurrently for a given scenario.

Parameters:

scenario – Scenario to generate records for
n – Number of records to generate
max_workers – Maximum number of concurrent workers
diversity_range – Range of diversity levels to use (min, max)

Returns:

List of generated records

AuraGen.generation.save_records_to_json(records: List[AgentActionRecord], settings: GenerationSettings, scenario_name: str)[source]

Save a list of AgentActionRecord to a JSON or JSONL file in the configured save directory.

Parameters:

records – List of records to save
settings – Generation settings containing output configuration
scenario_name – Name of the scenario for filename generation

class AuraGen.generation.OutputConfig(*, save_dir: str = 'save', record_file_template: str = '{scenario_name}_{timestamp}_{mode}.{ext}', file_format: Annotated[str, _PydanticGeneralMetadata(pattern='^(json|jsonl)$')] = 'json')[source]

Bases: BaseModel

Configuration for output settings.

save_dir: str

record_file_template: str

file_format: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.LocalConfig(*, model_name: str = 'llama3.1-8b-instruct', device: str = 'cuda', temperature: float = 0.7, max_length: int = 1024)[source]

Bases: BaseModel

Configuration for local HuggingFace model generation.

model_name: str

device: str

temperature: float

max_length: int

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AuraGen.generation.GenerationSettings(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 10, externalAPI_generation: bool = False, output: ~AuraGen.generation.OutputConfig = <factory>, openai: ~AuraGen.inference.OpenAIConfig | None = None, local: ~AuraGen.generation.LocalConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None)[source]

Bases: BaseModel

Top-level generation settings covering both modes.

mode: str

batch_size: int

externalAPI_generation: bool

output: OutputConfig

openai: OpenAIConfig | None

local: LocalConfig | None

externalAPI: externalAPIConfig | None

classmethod validate_openai(v, values)[source]: Ensure OpenAI config is present if mode is ‘openai’.

classmethod validate_local(v, values)[source]: Ensure local config is present if mode is ‘local’.

classmethod validate_externalAPI(v, values)[source]: Ensure externalAPI config is present if externalAPI_generation is True.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

AuraGen.generation.load_generation_settings(yaml_path: str) → GenerationSettings[source]

Load generation settings from YAML file.

Parameters:: yaml_path – Path to YAML file
Returns:: GenerationSettings object

AuraGen.generation.load_openai_config(yaml_path: str) → OpenAIConfig[source]: Load OpenAI API configuration from a YAML file.

Generator Classes

Base Generator

OpenAI Generator

External API Generator

Local Model Generator

Configuration Classes

Generation Settings

class AuraGen.generation.GenerationSettings(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 10, externalAPI_generation: bool = False, output: ~AuraGen.generation.OutputConfig = <factory>, openai: ~AuraGen.inference.OpenAIConfig | None = None, local: ~AuraGen.generation.LocalConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None)[source]

Bases: BaseModel

Top-level generation settings covering both modes.

mode: str

batch_size: int

externalAPI_generation: bool

output: OutputConfig

openai: OpenAIConfig | None

local: LocalConfig | None

externalAPI: externalAPIConfig | None

classmethod validate_openai(v, values)[source]: Ensure OpenAI config is present if mode is ‘openai’.

classmethod validate_local(v, values)[source]: Ensure local config is present if mode is ‘local’.

classmethod validate_externalAPI(v, values)[source]: Ensure externalAPI config is present if externalAPI_generation is True.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

OpenAI Config

class AuraGen.generation.OpenAIConfig(*, api_key: str, api_base: str | None = None, model: str = 'gpt-4o', temperature: float = 0.7, max_tokens: int = 2048)[source]

Bases: BaseModel

Configuration for OpenAI API-based inference.

api_key: str

api_base: str | None

model: str

temperature: float

max_tokens: int

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

External API Config

Local Config

class AuraGen.generation.LocalConfig(*, model_name: str = 'llama3.1-8b-instruct', device: str = 'cuda', temperature: float = 0.7, max_length: int = 1024)[source]

Bases: BaseModel

Configuration for local HuggingFace model generation.

model_name: str

device: str

temperature: float

max_length: int

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Factory Functions

AuraGen.generation.load_generation_settings(yaml_path: str) → GenerationSettings[source]

Load generation settings from YAML file.

Parameters:: yaml_path – Path to YAML file
Returns:: GenerationSettings object

Utility Functions

Examples

Basic Generation

from AuraGen.generation import OpenAIGenerator, OpenAIConfig

# Configure OpenAI generator
config = OpenAIConfig(
    api_key_type="openai_api_key",
    model="gpt-4o",
    temperature=1.0,
    max_tokens=2048
)

generator = OpenAIGenerator(config)

# Generate a single trajectory
trajectory = generator.generate_single(
    scenario_name="email_assistant",
    constraints={"industry": "technology"}
)

print(f"Generated: {trajectory}")

Batch Generation

from AuraGen.generation import create_generator, load_generation_settings

# Load settings from file
settings = load_generation_settings("config/generation.yaml")

# Create appropriate generator based on settings
generator = create_generator(settings)

# Generate multiple trajectories
trajectories = generator.generate_batch(
    scenario_names=["email_assistant", "calendar_manager"],
    batch_size=10
)

print(f"Generated {len(trajectories)} trajectories")

Custom Generator

from AuraGen.generation import BaseGenerator
from AuraGen.models import Trajectory

class CustomGenerator(BaseGenerator):
    def __init__(self, custom_config):
        super().__init__()
        self.config = custom_config

    def _generate_single_impl(self, prompt: str) -> str:
        # Custom generation logic
        return self.custom_api_call(prompt)

    def custom_api_call(self, prompt: str) -> str:
        # Your custom API integration
        pass

# Use custom generator
generator = CustomGenerator(my_config)
trajectory = generator.generate_single("test_scenario")

Error Handling

from AuraGen.generation import OpenAIGenerator, GenerationError

try:
    generator = OpenAIGenerator(config)
    trajectory = generator.generate_single("scenario")
except GenerationError as e:
    print(f"Generation failed: {e}")
    # Handle the error appropriately
except Exception as e:
    print(f"Unexpected error: {e}")

Performance Optimization

from AuraGen.generation import ExternalAPIGenerator
import asyncio

async def generate_concurrent():
    generator = ExternalAPIGenerator(config)

    # Generate multiple trajectories concurrently
    tasks = [
        generator.generate_single_async("scenario_1"),
        generator.generate_single_async("scenario_2"),
        generator.generate_single_async("scenario_3")
    ]

    trajectories = await asyncio.gather(*tasks)
    return trajectories

# Run concurrent generation
trajectories = asyncio.run(generate_concurrent())