utils

util functions

source

unbatch_list

 unbatch_list (inputs:List[List[Any]])

flattens a batched list

Type Details
inputs typing.List[typing.List[typing.Any]] input batched list
Returns typing.List[typing.Any] flattened output list

source

batch_list

 batch_list (inputs:List[Any], batch_size:int)

batches the input list into chunks of size batch_size, with the last batch ragged

if batch_size=0, returns list of all inputs

Type Details
inputs typing.List[typing.Any] input list to be batched
batch_size int batch size
Returns typing.List[typing.List[typing.Any]] batched output list
inputs = list(range(10))
assert unbatch_list(batch_list(inputs, 3)) == inputs

source

build_batch_from_embeddings

 build_batch_from_embeddings (embeddings:List[List[float]])

creates a Batch from a list of embeddings. Each embedding is converted to a Query with a unique collection_id

Type Details
embeddings typing.List[typing.List[float]] input embeddings
Returns Batch output batch
build_batch_from_embeddings([[0.1], [0.2]])
Batch(queries=[Query(item=None, embedding=[0.1], data={}, query_results=[], internal=InteralData(removed=False, removal_reason=None, parent_id=None, collection_id=0, iteration=None), id='query_191d47ea-5809-11ee-b05f-db94e348bdfb'), Query(item=None, embedding=[0.2], data={}, query_results=[], internal=InteralData(removed=False, removal_reason=None, parent_id=None, collection_id=1, iteration=None), id='query_191d47eb-5809-11ee-b05f-db94e348bdfb')])

source

build_batch_from_items

 build_batch_from_items (items:List[emb_opt.schemas.Item],
                         remap_collections=False)

creates a Batch from a list of Item objects. Each Item is converted to a Query. If remap_collections=True, each Query is given a unique collection_id. Otherwise, each Query retains the collection_id of the Item used to create it

Type Default Details
items typing.List[emb_opt.schemas.Item] input items
remap_collections bool False if collection ID should be remapped
Returns Batch output batch
build_batch_from_items([Item.from_minimal(embedding=[0.1])], remap_collections=True)
Batch(queries=[Query(item=None, embedding=[0.1], data={'_source_item_id': 'item_191d47ec-5809-11ee-b05f-db94e348bdfb'}, query_results=[], internal=InteralData(removed=False, removal_reason=None, parent_id=None, collection_id=0, iteration=None), id='query_191d47ed-5809-11ee-b05f-db94e348bdfb')])

source

whiten

 whiten (scores:numpy.ndarray)

Whitens vector of scores

Type Details
scores ndarray vector shape (n,) of scores to whiten
Returns ndarray vector shape (n,) whitened scores

source

clip_grad

 clip_grad (grad:numpy.ndarray, max_norm:float,
            norm_type:Union[float,int,str])
Type Details
grad ndarray grad vector
max_norm float max grad norm
norm_type typing.Union[float, int, str] type of norm to use
grad = np.array([1, 2, 3, 4, 5])
grads = np.stack([grad, grad])
assert (clip_grad(grad, 1., 2) == clip_grad(grads, 1., 2)[0]).all()

source

query_to_rl_inputs

 query_to_rl_inputs (query:emb_opt.schemas.Query)

source

compute_rl_grad

 compute_rl_grad (query_embeddings:numpy.ndarray,
                  result_embeddings:numpy.ndarray,
                  result_scores:numpy.ndarray, distance_penalty:float=0,
                  max_norm:Optional[float]=None,
                  norm_type:Union[float,int,str,NoneType]=2.0,
                  score_grad=False)

compute_rl_grad - uses reinforcement learning to estimate query gradients

To compute the gradient with RL: 1. compute advantages by whitening scores 1. advantage[i] = (scores[i] - scores.mean()) / scores.std() 2. compute advantage loss 1. advantage_loss[i] = advantage[i] * (query_embedding - result_embedding[i])**2 3. compute distance loss 1. distance_loss[i] = distance_penalty * (query_embedding - result_embedding[i])**2 4. sum loss terms 1. loss[i] = advantage_loss[i] + distance_loss[i] 5. compute the gradient

This gives a closed for calculation of the gradient as:

grad[i] = 2 * (advantage[i] + distance_penalty) * (query_embedding - result_embedding[i])

if max_norm is specified, the gradient will be clipped using norm_type

if score_grad=True, the sign of the gradient is flipped. The standard sign is designed for minimizing the loss via gradient descent via n_new = n_old - lr * grad. With the sign flipped, the gradient points directly in the direction of increasing score, which is conceptually aligned with hill climbing, updating via n_new = n_old + lr * grad. Use score_grad=False for anything using gradient descent.

Type Default Details
query_embeddings ndarray matrix of query embeddings
result_embeddings ndarray matrix of result embeddings
result_scores ndarray array of scores
distance_penalty float 0 distance penalty coefficient
max_norm typing.Optional[float] None max gradient norm
norm_type typing.Union[float, int, str, NoneType] 2.0 type of norm to use
score_grad bool False if loss should be score grad or loss grad