item = Item(id=None, embedding=[0.1], item=None, score=None, data=None)
assert item.id
old_id = item.id
item = Item.model_validate(item)
assert item.id == old_idSchemas
High Level Overview
emb_opt is designed to run hill climbing algorithms in embedding spaces. In practice, this means we are searching through some explicit vector database or the implicit embedding space of some generative model, which we refer to as a DataSource. We denote the continuous space as referring to embeddings, and the discrete space as referring to discrete things represented by embeddings.
The DataSource is queried with a Query. The Query contains a query embedding and optionally an item (some discrete thing represented by the embedding). The DataSource uses the Query to return a list of Item objects. An Item represents a discrete thing returned by the DataSource
The Item results are optionally sent to a Filter, which removes results based on some True/False criteria.
The Item results are then sent to a Score which assigns some numeric score value to each Item.
The Query and scored Item results are sent to a Update which uses the scored items to generate a new Query. Update methods are denoted as discrete or continuous. continuous updates generate new queries purely in embedding space (ie by averaging Item embeddings). discrete updates create new queries specifically from Item results, such that each query can have a specific item associated with it (not possible with continuous updates). continuous updates generally converge faster, but certain types of DataSource may require a discrete item query and therefore be incompatible with continuous updates.
Some Update methods generate multiple new queries. To control the total number of queries, a Prune step is optionally added before the Update step.
The general flow is: 1. Start with a Batch of Query objects * Query the DataSource * (optional) Send results to the Filter * Send results to the Score * (optional) Prune queries * Use scored results to Update to a new set of queries
The schemas present here define the required input/output structure for each step to allow for fully flexible plugins to the process
Data Objects
Internal Data
InternalData tracks internal information as part of the embedding search. This data is managed internally, but may be useful for certain Prune or Update configurations.
InternalData.removed denotes if the related Item or Query has been removed or invalidated by some step (see DataSourceResponse, FilterResponse, ScoreResponse, PruneResponse)
InternalData.removal_reason details the removal reason
InternalData.parent_id is the ID string of the parent Query to the related Item or Query object. InternalData.parent_id always points to a Query, never an Item
InternalData.collection_id groups Item and Query objects that come from the same initial Query. This is useful when an Update step generates multiple new queries from a single input
InternalData.iteration denotes which iteration of the search created the related Item or Query
InteralData
InteralData (removed:bool, removal_reason:Optional[str], parent_id:Optional[str], collection_id:Optional[int], iteration:Optional[int])
Internal Data Tracking
Item
The Item schema is the basic “object” or “thing” we are looking for. The goal of emb_opt is to discover an Item with a high score
Item.id is the index/ID of the item (for example the database index). If no ID is provided, one will be created as a UUID. emb_opt assumes Item.id is unique to the item.
Item.item is the discrete thing itself
Item.score is the score of the item. emb_opt assumes a hill climbing scenario where higher scores are better than lower scores.
Item.data is a dictionary container for any other information associated with the item (ie other fields returned from a database query)
Item
Item (id:Union[int,str,NoneType], item:Optional[Any], embedding:List[float], score:Optional[float], data:Optional[dict], **extra_data:Any)
Usage docs: https://docs.pydantic.dev/2.7/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of classvars defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The signature for instantiating the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a `RootModel`.
__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.
__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.
__pydantic_extra__: An instance attribute with the values of extra fields from validation when
`model_config['extra'] == 'allow'`.
__pydantic_fields_set__: An instance attribute with the names of fields explicitly set.
__pydantic_private__: Instance attribute with the values of private attributes set on the model instance.
| Type | Details | |
|---|---|---|
| data | Any | |
| Returns | None | type: ignore |
Query
A Query is the basic object for searching a DataSource and holding Item results returned by the search.
Query.item is an (optional) discrete item associated with the Query. This is populated automatically when they query is created from an Item via Query.from_item
Query.embedding is the embedding associated with the Query
Query.data is a dictionary container for any other information associated with the query
Query.query_results is a list of Item objects returned from a query
Query
Query (item:Optional[Any], embedding:List[float], data:Optional[dict], query_results:Optional[list[__main__.Item]], **extra_data:Any)
Usage docs: https://docs.pydantic.dev/2.7/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of classvars defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The signature for instantiating the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a `RootModel`.
__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.
__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.
__pydantic_extra__: An instance attribute with the values of extra fields from validation when
`model_config['extra'] == 'allow'`.
__pydantic_fields_set__: An instance attribute with the names of fields explicitly set.
__pydantic_private__: Instance attribute with the values of private attributes set on the model instance.
| Type | Details | |
|---|---|---|
| data | Any | |
| Returns | None | type: ignore |
A Query holds Items, tracks parent/child relationships, and allows for convenient iteration
query = Query.from_minimal(embedding=[0.1])
query.update_internal(collection_id=0) # add collection ID
query_results = [
Item.from_minimal(item='item1', embedding=[0.1]),
Item.from_minimal(item='item2', embedding=[0.1]),
]
query.add_query_results(query_results)
# iteration over query results
assert len([i for i in query]) == 2
# propagation of query parent data
for query_result in query:
assert query_result.internal.parent_id == query.id
assert query_result.internal.collection_id == query.internal.collection_idItems may be removed by various steps. Removed items are kept within the Query for logging purposes. Query.valid_results and Query.enumerate_query_results allow us to automatically skip removed items during iteration
assert len(list(query.valid_results())) == 2
query.query_results[0].update_internal(removed=True) # set first result to removed
assert len(list(query.valid_results())) == 1
assert len(list(query.enumerate_query_results())) == 1
assert len(list(query.enumerate_query_results(skip_removed=False))) == 2
query.query_results[1].update_internal(removed=True) # set second result to removed
query.update_internal() # update query internal
assert query.internal.removed # query sets itself to removed when all query results are removedQueries can be created from another Query or another Item, with automatic data propagation between them
# create query from item
item = Item.from_minimal(item='test_item', embedding=[0.1])
query = Query.from_item(item)
assert query.item == item.item
# create query from query
query = Query.from_minimal(embedding=[0.1])
new_query = Query.from_parent_query(embedding=[0.2], parent_query=query)
assert new_query.internal.parent_id == query.idBatch
The Batch object holds a list of Query objects and provides convenience functions for iterating over queries and query results
Batch
Batch (queries:List[__main__.Query])
Usage docs: https://docs.pydantic.dev/2.7/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of classvars defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The signature for instantiating the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a `RootModel`.
__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.
__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.
__pydantic_extra__: An instance attribute with the values of extra fields from validation when
`model_config['extra'] == 'allow'`.
__pydantic_fields_set__: An instance attribute with the names of fields explicitly set.
__pydantic_private__: Instance attribute with the values of private attributes set on the model instance.
A Batch allows us to iterate over the queries and items in the batch in several ways
def build_test_batch(n_queries, n_items):
queries = []
for i in range(n_queries):
query = Query.from_minimal(item=f'query_{i}', embedding=[0.1])
for j in range(n_items):
item = Item.from_minimal(item=f'item_{j}', embedding=[0.1])
query.add_query_results([item])
queries.append(query)
return Batch(queries=queries)
n_queries = 3
n_items = 4
batch = build_test_batch(n_queries, n_items)
assert len(list(batch.valid_queries())) == n_queries
idxs, results = batch.flatten_query_results()
assert len(results) == n_queries*n_items
assert batch.get_item(*idxs[0]) == batch[idxs[0][0]][idxs[0][1]]When items or queries are removed, this is accounted for
batch = build_test_batch(n_queries, n_items)
batch[1].update_internal(removed=True) # invalidate query
batch[0][0].update_internal(removed=True) # invalidate item
batch[0][1].update_internal(removed=True) # invalidate item
assert len(list(batch.valid_queries())) == n_queries-1 # 1 batch removed
idxs, results = batch.flatten_query_results(skip_removed=False) # return all queries
assert len(results) == n_queries*n_items
# skips results where `removed=True`, and all results under a query with `removed=True`
idxs, results = batch.flatten_query_results(skip_removed=True)
# n_items removed from invalid query 1, 2 items invalidated
assert len(results) == n_queries*n_items - n_items - 2Data Source
The DataSourceFunction schema defines the interface for data source queries. The function takes a list of MinimalQuery objects and returns a list of DataSourceResponse objects.
DataSourceResponse
DataSourceResponse (valid:bool, data:Optional[Dict], query_results:List[__main__.Item])
Usage docs: https://docs.pydantic.dev/2.7/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of classvars defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The signature for instantiating the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a `RootModel`.
__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.
__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.
__pydantic_extra__: An instance attribute with the values of extra fields from validation when
`model_config['extra'] == 'allow'`.
__pydantic_fields_set__: An instance attribute with the names of fields explicitly set.
__pydantic_private__: Instance attribute with the values of private attributes set on the model instance.
| Type | Details | |
|---|---|---|
| data | Any | |
| Returns | None | type: ignore |
Filter
The FilterFunction schema defines the interface for filtering result items. The function takes a list of Item objects and returns a list of FilterResponse objects.
FilterResponse
FilterResponse (valid:bool, data:Optional[Dict])
Usage docs: https://docs.pydantic.dev/2.7/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of classvars defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The signature for instantiating the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a `RootModel`.
__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.
__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.
__pydantic_extra__: An instance attribute with the values of extra fields from validation when
`model_config['extra'] == 'allow'`.
__pydantic_fields_set__: An instance attribute with the names of fields explicitly set.
__pydantic_private__: Instance attribute with the values of private attributes set on the model instance.
| Type | Details | |
|---|---|---|
| data | Any | |
| Returns | None | type: ignore |
Score
The ScoreFunction schema defines the interface for scoring result items. The function takes a list of Item objects and returns a list of ScoreResponse objects.
ScoreResponse
ScoreResponse (valid:bool, score:Optional[float], data:Optional[Dict])
Usage docs: https://docs.pydantic.dev/2.7/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of classvars defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The signature for instantiating the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a `RootModel`.
__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.
__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.
__pydantic_extra__: An instance attribute with the values of extra fields from validation when
`model_config['extra'] == 'allow'`.
__pydantic_fields_set__: An instance attribute with the names of fields explicitly set.
__pydantic_private__: Instance attribute with the values of private attributes set on the model instance.
| Type | Details | |
|---|---|---|
| data | Any | |
| Returns | None | type: ignore |
Prune
The PruneFunction schema defines the interface for pruning queries. The function takes a list of Query objects and returns a list of PruneResponse objects.
PruneResponse
PruneResponse (valid:bool, data:Optional[Dict])
Usage docs: https://docs.pydantic.dev/2.7/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of classvars defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The signature for instantiating the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a `RootModel`.
__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.
__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.
__pydantic_extra__: An instance attribute with the values of extra fields from validation when
`model_config['extra'] == 'allow'`.
__pydantic_fields_set__: An instance attribute with the names of fields explicitly set.
__pydantic_private__: Instance attribute with the values of private attributes set on the model instance.
| Type | Details | |
|---|---|---|
| data | Any | |
| Returns | None | type: ignore |
Update
The UpdateFunction schema defines the interface for pruning queries. The function takes a list of Query objects and returns a list of new Query objects.
UpdateResponse
UpdateResponse (query:__main__.Query, parent_id:Optional[str])
Usage docs: https://docs.pydantic.dev/2.7/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of classvars defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The signature for instantiating the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The pydantic-core schema used to build the SchemaValidator and SchemaSerializer.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a `RootModel`.
__pydantic_serializer__: The pydantic-core SchemaSerializer used to dump instances of the model.
__pydantic_validator__: The pydantic-core SchemaValidator used to validate instances of the model.
__pydantic_extra__: An instance attribute with the values of extra fields from validation when
`model_config['extra'] == 'allow'`.
__pydantic_fields_set__: An instance attribute with the names of fields explicitly set.
__pydantic_private__: Instance attribute with the values of private attributes set on the model instance.