inputs = list(range(10))
assert unbatch_list(batch_list(inputs, 3)) == inputsutils
unbatch_list
unbatch_list (inputs:List[List[Any]])
flattens a batched list
| Type | Details | |
|---|---|---|
| inputs | typing.List[typing.List[typing.Any]] | input batched list |
| Returns | typing.List[typing.Any] | flattened output list |
batch_list
batch_list (inputs:List[Any], batch_size:int)
batches the input list into chunks of size batch_size, with the last batch ragged
if batch_size=0, returns list of all inputs
| Type | Details | |
|---|---|---|
| inputs | typing.List[typing.Any] | input list to be batched |
| batch_size | int | batch size |
| Returns | typing.List[typing.List[typing.Any]] | batched output list |
build_batch_from_embeddings
build_batch_from_embeddings (embeddings:List[List[float]])
creates a Batch from a list of embeddings. Each embedding is converted to a Query with a unique collection_id
| Type | Details | |
|---|---|---|
| embeddings | typing.List[typing.List[float]] | input embeddings |
| Returns | Batch | output batch |
build_batch_from_embeddings([[0.1], [0.2]])Batch(queries=[Query(item=None, embedding=[0.1], data={}, query_results=[], internal=InteralData(removed=False, removal_reason=None, parent_id=None, collection_id=0, iteration=None), id='query_191d47ea-5809-11ee-b05f-db94e348bdfb'), Query(item=None, embedding=[0.2], data={}, query_results=[], internal=InteralData(removed=False, removal_reason=None, parent_id=None, collection_id=1, iteration=None), id='query_191d47eb-5809-11ee-b05f-db94e348bdfb')])
build_batch_from_items
build_batch_from_items (items:List[emb_opt.schemas.Item], remap_collections=False)
creates a Batch from a list of Item objects. Each Item is converted to a Query. If remap_collections=True, each Query is given a unique collection_id. Otherwise, each Query retains the collection_id of the Item used to create it
| Type | Default | Details | |
|---|---|---|---|
| items | typing.List[emb_opt.schemas.Item] | input items | |
| remap_collections | bool | False | if collection ID should be remapped |
| Returns | Batch | output batch |
build_batch_from_items([Item.from_minimal(embedding=[0.1])], remap_collections=True)Batch(queries=[Query(item=None, embedding=[0.1], data={'_source_item_id': 'item_191d47ec-5809-11ee-b05f-db94e348bdfb'}, query_results=[], internal=InteralData(removed=False, removal_reason=None, parent_id=None, collection_id=0, iteration=None), id='query_191d47ed-5809-11ee-b05f-db94e348bdfb')])
whiten
whiten (scores:numpy.ndarray)
Whitens vector of scores
| Type | Details | |
|---|---|---|
| scores | ndarray | vector shape (n,) of scores to whiten |
| Returns | ndarray | vector shape (n,) whitened scores |
clip_grad
clip_grad (grad:numpy.ndarray, max_norm:float, norm_type:Union[float,int,str])
| Type | Details | |
|---|---|---|
| grad | ndarray | grad vector |
| max_norm | float | max grad norm |
| norm_type | typing.Union[float, int, str] | type of norm to use |
grad = np.array([1, 2, 3, 4, 5])
grads = np.stack([grad, grad])
assert (clip_grad(grad, 1., 2) == clip_grad(grads, 1., 2)[0]).all()query_to_rl_inputs
query_to_rl_inputs (query:emb_opt.schemas.Query)
compute_rl_grad
compute_rl_grad (query_embeddings:numpy.ndarray, result_embeddings:numpy.ndarray, result_scores:numpy.ndarray, distance_penalty:float=0, max_norm:Optional[float]=None, norm_type:Union[float,int,str,NoneType]=2.0, score_grad=False)
compute_rl_grad - uses reinforcement learning to estimate query gradients
To compute the gradient with RL: 1. compute advantages by whitening scores 1. advantage[i] = (scores[i] - scores.mean()) / scores.std() 2. compute advantage loss 1. advantage_loss[i] = advantage[i] * (query_embedding - result_embedding[i])**2 3. compute distance loss 1. distance_loss[i] = distance_penalty * (query_embedding - result_embedding[i])**2 4. sum loss terms 1. loss[i] = advantage_loss[i] + distance_loss[i] 5. compute the gradient
This gives a closed for calculation of the gradient as:
grad[i] = 2 * (advantage[i] + distance_penalty) * (query_embedding - result_embedding[i])
if max_norm is specified, the gradient will be clipped using norm_type
if score_grad=True, the sign of the gradient is flipped. The standard sign is designed for minimizing the loss via gradient descent via n_new = n_old - lr * grad. With the sign flipped, the gradient points directly in the direction of increasing score, which is conceptually aligned with hill climbing, updating via n_new = n_old + lr * grad. Use score_grad=False for anything using gradient descent.
| Type | Default | Details | |
|---|---|---|---|
| query_embeddings | ndarray | matrix of query embeddings | |
| result_embeddings | ndarray | matrix of result embeddings | |
| result_scores | ndarray | array of scores | |
| distance_penalty | float | 0 | distance penalty coefficient |
| max_norm | typing.Optional[float] | None | max gradient norm |
| norm_type | typing.Union[float, int, str, NoneType] | 2.0 | type of norm to use |
| score_grad | bool | False | if loss should be score grad or loss grad |