Generation Module
The generation module handles the creation of harmless trajectories using various API providers.
Harmless Data Generation Module
This module generates original, harmless agent action records using LLMs. Supports both open-source (local) and API-based (OpenAI) models.
- class AuraGen.generation.ContextDiversifier[source]
Bases:
objectUtility class to create diverse variations of scenario contexts. Enhances generation diversity by modifying tools, environment variables, and other context elements using LLM-based diversification.
- static create_diverse_scenario(scenario: Scenario, diversity_level: float = 0.5, llm_client=None) Scenario[source]
Create a variation of the scenario with diversified context.
- Parameters:
scenario – Original scenario
diversity_level – Level of diversification (0.0 to 1.0)
llm_client – Optional LLM client for intelligent diversification
- Returns:
New scenario with diversified context
- static diversify_context(context: ScenarioContext, diversity_level: float, llm_client=None) ScenarioContext[source]
Diversify a scenario context by modifying its components.
- static diversify_tool(tool: Tool, diversity_level: float, llm_client=None) Tool[source]
Create a diversified version of a tool.
- static diversify_environment(env: Environment, diversity_level: float, llm_client=None) Environment[source]
Create a diversified version of an environment.
- static diversify_env_variable(var: EnvironmentVariable, diversity_level: float, llm_client=None) EnvironmentVariable[source]
Create a diversified version of an environment variable.
- static diversify_value_with_llm(value: str, name: str, description: str, diversity_level: float, llm_client) str[source]
Use LLM to create a semantically similar but different value.
- Parameters:
value – Original string value
name – Name/key of the variable
description – Description of what the value represents
diversity_level – Level of diversification (0.0 to 1.0)
llm_client – LLM client or InferenceManager to use for diversification
- Returns:
Diversified string value
- static diversify_examples_with_llm(examples: List[str], tool_name: str, tool_description: str, diversity_level: float, llm_client) List[str][source]
Use LLM to create diverse variants of tool examples.
- Parameters:
examples – Original list of example strings
tool_name – Name of the tool
tool_description – Description of the tool
diversity_level – Level of diversification (0.0 to 1.0)
llm_client – LLM client or InferenceManager to use for diversification
- Returns:
List of diversified examples
- static diversify_variables_with_llm(variables: Dict[str, Any], diversity_level: float, llm_client, context_name: str) Dict[str, Any][source]
Use LLM to create semantically diverse variations of variable values.
- Parameters:
variables – Dictionary of variables to diversify
diversity_level – Level of diversification (0.0 to 1.0)
llm_client – LLM client to use for diversification
context_name – Name of the context (for prompt)
- Returns:
Dictionary with diversified values
- class AuraGen.generation.MetadataDefinition(*, description: str, prompt_template: str, type: Annotated[str, _PydanticGeneralMetadata(pattern='^(categorical|range)$')], values: List[str] | None = None)[source]
Bases:
BaseModelDefinition of how a metadata attribute should be interpreted.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AuraGen.generation.MetadataConfig(*, generation_attributes: Dict[str, MetadataDefinition])[source]
Bases:
BaseModelConfiguration for metadata handling.
- generation_attributes: Dict[str, MetadataDefinition]
- get_constraint_for_attribute(attr_name: str, value: Any) str | None[source]
Generate a constraint string for a given metadata attribute and value.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AuraGen.generation.AgentActionRecord(*, scenario_name: str, user_request: str, agent_action: ~typing.List[str], agent_response: str, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)[source]
Bases:
BaseModelData model for a single agent action record.
- class Config[source]
Bases:
object- json_encoders = {<class 'datetime.datetime'>: <function AgentActionRecord.Config.<lambda>>}
- model_config: ClassVar[ConfigDict] = {'json_encoders': {<class 'datetime.datetime'>: <function AgentActionRecord.Config.<lambda>>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AuraGen.generation.HarmlessDataGeneratorBase(config: GuardianConfig, metadata_config: MetadataConfig | None = None)[source]
Bases:
objectAbstract base class for harmless data generators.
- __init__(config: GuardianConfig, metadata_config: MetadataConfig | None = None)[source]
- generate_record(scenario: Scenario, diversity_level: float = 0.5) AgentActionRecord[source]
Generate a single agent action record for a given scenario. Must be implemented by subclasses.
- generate_batch(scenario: Scenario, n: int = 10, diversity_range: Tuple[float, float] = (0.3, 0.8)) List[AgentActionRecord][source]
Generate a batch of agent action records for a given scenario.
- Parameters:
scenario – Scenario to generate records for
n – Number of records to generate
diversity_range – Range of diversity levels to use (min, max)
- Returns:
List of generated records
- class AuraGen.generation.OpenAIHarmlessDataGenerator(config: GuardianConfig, openai_config: OpenAIConfig, metadata_config: MetadataConfig | None = None, externalAPI_config: externalAPIConfig | None = None, use_internal_inference: bool = False)[source]
Bases:
HarmlessDataGeneratorBaseHarmless data generator using OpenAI API with enhanced tool and environment support.
- __init__(config: GuardianConfig, openai_config: OpenAIConfig, metadata_config: MetadataConfig | None = None, externalAPI_config: externalAPIConfig | None = None, use_internal_inference: bool = False)[source]
Initialize OpenAI-based harmless data generator.
- Parameters:
config – Guardian configuration
openai_config – OpenAI API configuration
metadata_config – Optional metadata configuration
externalAPI_config – Optional externalAPI API configuration (for internal inference)
use_internal_inference – Whether to use internal inference
- generate_record(scenario: Scenario, diversity_level: float = 0.5) AgentActionRecord[source]
Generate a single agent action record with enhanced tool and environment awareness.
- Parameters:
scenario – The scenario to generate an action record for
diversity_level – Level of diversity for context elements (0.0 to 1.0)
- Returns:
A generated agent action record
- generate_batch_concurrent(scenario: Scenario, n: int = 10, max_workers: int = 5, diversity_range: Tuple[float, float] = (0.3, 0.8)) List[AgentActionRecord][source]
Generate a batch of agent action records concurrently for a given scenario.
- Parameters:
scenario – Scenario to generate records for
n – Number of records to generate
max_workers – Maximum number of concurrent workers
diversity_range – Range of diversity levels to use (min, max)
- Returns:
List of generated records
- class AuraGen.generation.LocalHarmlessDataGenerator(config: GuardianConfig, model_name: str = 'gpt2', metadata_config: MetadataConfig | None = None, llm_client=None)[source]
Bases:
HarmlessDataGeneratorBaseHarmless data generator using a local open-source LLM (e.g., HuggingFace Transformers).
- __init__(config: GuardianConfig, model_name: str = 'gpt2', metadata_config: MetadataConfig | None = None, llm_client=None)[source]
- generate_record(scenario: Scenario, diversity_level: float = 0.5) AgentActionRecord[source]
Generate a single agent action record with diverse context elements.
- Parameters:
scenario – The scenario to generate an action record for
diversity_level – Level of diversity for context elements (0.0 to 1.0)
- Returns:
A generated agent action record
- generate_batch_concurrent(scenario: Scenario, n: int = 10, max_workers: int = 5, diversity_range: Tuple[float, float] = (0.3, 0.8)) List[AgentActionRecord][source]
Generate a batch of agent action records concurrently for a given scenario.
- Parameters:
scenario – Scenario to generate records for
n – Number of records to generate
max_workers – Maximum number of concurrent workers
diversity_range – Range of diversity levels to use (min, max)
- Returns:
List of generated records
- AuraGen.generation.save_records_to_json(records: List[AgentActionRecord], settings: GenerationSettings, scenario_name: str)[source]
Save a list of AgentActionRecord to a JSON or JSONL file in the configured save directory.
- Parameters:
records – List of records to save
settings – Generation settings containing output configuration
scenario_name – Name of the scenario for filename generation
- class AuraGen.generation.OutputConfig(*, save_dir: str = 'save', record_file_template: str = '{scenario_name}_{timestamp}_{mode}.{ext}', file_format: Annotated[str, _PydanticGeneralMetadata(pattern='^(json|jsonl)$')] = 'json')[source]
Bases:
BaseModelConfiguration for output settings.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AuraGen.generation.LocalConfig(*, model_name: str = 'llama3.1-8b-instruct', device: str = 'cuda', temperature: float = 0.7, max_length: int = 1024)[source]
Bases:
BaseModelConfiguration for local HuggingFace model generation.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AuraGen.generation.GenerationSettings(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 10, externalAPI_generation: bool = False, output: ~AuraGen.generation.OutputConfig = <factory>, openai: ~AuraGen.inference.OpenAIConfig | None = None, local: ~AuraGen.generation.LocalConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None)[source]
Bases:
BaseModelTop-level generation settings covering both modes.
- output: OutputConfig
- openai: OpenAIConfig | None
- local: LocalConfig | None
- classmethod validate_openai(v, values)[source]
Ensure OpenAI config is present if mode is ‘openai’.
- classmethod validate_externalAPI(v, values)[source]
Ensure externalAPI config is present if externalAPI_generation is True.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- AuraGen.generation.load_generation_settings(yaml_path: str) GenerationSettings[source]
Load generation settings from YAML file.
- Parameters:
yaml_path – Path to YAML file
- Returns:
GenerationSettings object
- AuraGen.generation.load_openai_config(yaml_path: str) OpenAIConfig[source]
Load OpenAI API configuration from a YAML file.
Generator Classes
Base Generator
OpenAI Generator
External API Generator
Local Model Generator
Configuration Classes
Generation Settings
- class AuraGen.generation.GenerationSettings(*, mode: ~typing.Annotated[str, _PydanticGeneralMetadata(pattern='^(openai|local)$')] = 'openai', batch_size: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 10, externalAPI_generation: bool = False, output: ~AuraGen.generation.OutputConfig = <factory>, openai: ~AuraGen.inference.OpenAIConfig | None = None, local: ~AuraGen.generation.LocalConfig | None = None, externalAPI: ~AuraGen.inference.externalAPIConfig | None = None)[source]
Bases:
BaseModelTop-level generation settings covering both modes.
- output: OutputConfig
- openai: OpenAIConfig | None
- local: LocalConfig | None
- classmethod validate_openai(v, values)[source]
Ensure OpenAI config is present if mode is ‘openai’.
- classmethod validate_externalAPI(v, values)[source]
Ensure externalAPI config is present if externalAPI_generation is True.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
OpenAI Config
- class AuraGen.generation.OpenAIConfig(*, api_key: str, api_base: str | None = None, model: str = 'gpt-4o', temperature: float = 0.7, max_tokens: int = 2048)[source]
Bases:
BaseModelConfiguration for OpenAI API-based inference.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
External API Config
Local Config
- class AuraGen.generation.LocalConfig(*, model_name: str = 'llama3.1-8b-instruct', device: str = 'cuda', temperature: float = 0.7, max_length: int = 1024)[source]
Bases:
BaseModelConfiguration for local HuggingFace model generation.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Factory Functions
- AuraGen.generation.load_generation_settings(yaml_path: str) GenerationSettings[source]
Load generation settings from YAML file.
- Parameters:
yaml_path – Path to YAML file
- Returns:
GenerationSettings object
Utility Functions
Examples
Basic Generation
from AuraGen.generation import OpenAIGenerator, OpenAIConfig
# Configure OpenAI generator
config = OpenAIConfig(
api_key_type="openai_api_key",
model="gpt-4o",
temperature=1.0,
max_tokens=2048
)
generator = OpenAIGenerator(config)
# Generate a single trajectory
trajectory = generator.generate_single(
scenario_name="email_assistant",
constraints={"industry": "technology"}
)
print(f"Generated: {trajectory}")
Batch Generation
from AuraGen.generation import create_generator, load_generation_settings
# Load settings from file
settings = load_generation_settings("config/generation.yaml")
# Create appropriate generator based on settings
generator = create_generator(settings)
# Generate multiple trajectories
trajectories = generator.generate_batch(
scenario_names=["email_assistant", "calendar_manager"],
batch_size=10
)
print(f"Generated {len(trajectories)} trajectories")
Custom Generator
from AuraGen.generation import BaseGenerator
from AuraGen.models import Trajectory
class CustomGenerator(BaseGenerator):
def __init__(self, custom_config):
super().__init__()
self.config = custom_config
def _generate_single_impl(self, prompt: str) -> str:
# Custom generation logic
return self.custom_api_call(prompt)
def custom_api_call(self, prompt: str) -> str:
# Your custom API integration
pass
# Use custom generator
generator = CustomGenerator(my_config)
trajectory = generator.generate_single("test_scenario")
Error Handling
from AuraGen.generation import OpenAIGenerator, GenerationError
try:
generator = OpenAIGenerator(config)
trajectory = generator.generate_single("scenario")
except GenerationError as e:
print(f"Generation failed: {e}")
# Handle the error appropriately
except Exception as e:
print(f"Unexpected error: {e}")
Performance Optimization
from AuraGen.generation import ExternalAPIGenerator
import asyncio
async def generate_concurrent():
generator = ExternalAPIGenerator(config)
# Generate multiple trajectories concurrently
tasks = [
generator.generate_single_async("scenario_1"),
generator.generate_single_async("scenario_2"),
generator.generate_single_async("scenario_3")
]
trajectories = await asyncio.gather(*tasks)
return trajectories
# Run concurrent generation
trajectories = asyncio.run(generate_concurrent())