utils

util functions

unbatch_list

 unbatch_list (inputs:List[List[Any]])

flattens a batched list

	Type	Details
inputs	typing.List[typing.List[typing.Any]]	input batched list
Returns	typing.List[typing.Any]	flattened output list

source

batch_list

 batch_list (inputs:List[Any], batch_size:int)

batches the input list into chunks of size batch_size, with the last batch ragged

if batch_size=0, returns list of all inputs

	Type	Details
inputs	typing.List[typing.Any]	input list to be batched
batch_size	int	batch size
Returns	typing.List[typing.List[typing.Any]]	batched output list

inputs = list(range(10))
assert unbatch_list(batch_list(inputs, 3)) == inputs

source

build_batch_from_embeddings

 build_batch_from_embeddings (embeddings:List[List[float]])

creates a Batch from a list of embeddings. Each embedding is converted to a Query with a unique collection_id

	Type	Details
embeddings	typing.List[typing.List[float]]	input embeddings
Returns	Batch	output batch

build_batch_from_embeddings([[0.1], [0.2]])

Batch(queries=[Query(item=None, embedding=[0.1], data={}, query_results=[], internal=InteralData(removed=False, removal_reason=None, parent_id=None, collection_id=0, iteration=None), id='query_191d47ea-5809-11ee-b05f-db94e348bdfb'), Query(item=None, embedding=[0.2], data={}, query_results=[], internal=InteralData(removed=False, removal_reason=None, parent_id=None, collection_id=1, iteration=None), id='query_191d47eb-5809-11ee-b05f-db94e348bdfb')])

source

build_batch_from_items

 build_batch_from_items (items:List[emb_opt.schemas.Item],
                         remap_collections=False)

creates a Batch from a list of Item objects. Each Item is converted to a Query. If remap_collections=True, each Query is given a unique collection_id. Otherwise, each Query retains the collection_id of the Item used to create it

	Type	Default	Details
items	typing.List[emb_opt.schemas.Item]		input items
remap_collections	bool	False	if collection ID should be remapped
Returns	Batch		output batch

build_batch_from_items([Item.from_minimal(embedding=[0.1])], remap_collections=True)

Batch(queries=[Query(item=None, embedding=[0.1], data={'_source_item_id': 'item_191d47ec-5809-11ee-b05f-db94e348bdfb'}, query_results=[], internal=InteralData(removed=False, removal_reason=None, parent_id=None, collection_id=0, iteration=None), id='query_191d47ed-5809-11ee-b05f-db94e348bdfb')])

source

whiten

 whiten (scores:numpy.ndarray)

Whitens vector of scores

	Type	Details
scores	ndarray	vector shape (n,) of scores to whiten
Returns	ndarray	vector shape (n,) whitened scores

source

clip_grad

 clip_grad (grad:numpy.ndarray, max_norm:float,
            norm_type:Union[float,int,str])

	Type	Details
grad	ndarray	grad vector
max_norm	float	max grad norm
norm_type	typing.Union[float, int, str]	type of norm to use

grad = np.array([1, 2, 3, 4, 5])
grads = np.stack([grad, grad])
assert (clip_grad(grad, 1., 2) == clip_grad(grads, 1., 2)[0]).all()

source

query_to_rl_inputs

 query_to_rl_inputs (query:emb_opt.schemas.Query)

source

compute_rl_grad

 compute_rl_grad (query_embeddings:numpy.ndarray,
                  result_embeddings:numpy.ndarray,
                  result_scores:numpy.ndarray, distance_penalty:float=0,
                  max_norm:Optional[float]=None,
                  norm_type:Union[float,int,str,NoneType]=2.0,
                  score_grad=False)

compute_rl_grad - uses reinforcement learning to estimate query gradients

To compute the gradient with RL: 1. compute advantages by whitening scores 1. advantage[i] = (scores[i] - scores.mean()) / scores.std() 2. compute advantage loss 1. advantage_loss[i] = advantage[i] * (query_embedding - result_embedding[i])**2 3. compute distance loss 1. distance_loss[i] = distance_penalty * (query_embedding - result_embedding[i])**2 4. sum loss terms 1. loss[i] = advantage_loss[i] + distance_loss[i] 5. compute the gradient

This gives a closed for calculation of the gradient as:

grad[i] = 2 * (advantage[i] + distance_penalty) * (query_embedding - result_embedding[i])

if max_norm is specified, the gradient will be clipped using norm_type

if score_grad=True, the sign of the gradient is flipped. The standard sign is designed for minimizing the loss via gradient descent via n_new = n_old - lr * grad. With the sign flipped, the gradient points directly in the direction of increasing score, which is conceptually aligned with hill climbing, updating via n_new = n_old + lr * grad. Use score_grad=False for anything using gradient descent.

	Type	Default	Details
query_embeddings	ndarray		matrix of query embeddings
result_embeddings	ndarray		matrix of result embeddings
result_scores	ndarray		array of scores
distance_penalty	float	0	distance penalty coefficient
max_norm	typing.Optional[float]	None	max gradient norm
norm_type	typing.Union[float, int, str, NoneType]	2.0	type of norm to use
score_grad	bool	False	if loss should be score grad or loss grad