n_vectors = 256
d_vectors = 64
k = 10
n_queries = 5
vectors = np.random.randn(n_vectors, d_vectors)
vector_data = [{'other':np.random.randint(0,1e3),
'item':str(np.random.randint(0,1e4))}
for i in range(vectors.shape[0])]
index = faiss.IndexFlatL2(d_vectors)
index.add(vectors)
data_function = FaissDataPlugin(5, index, vector_data, 'item')
data_module = DataSourceModule(data_function)
batch = build_batch_from_embeddings(np.random.randn(n_queries, d_vectors))
batch2 = data_module(batch)Fauss Plugins
Faiss functions and classes
Faiss Data Plugin
The FaissDataPlugin integrates with a faiss index.
search_params can be any params object compatible with faiss search
FaissDataPlugin
FaissDataPlugin (k:int, faiss_index:faiss.swigfaiss_avx2.Index, item_data:Optional[List[Dict]]=None, item_key:Optional[str]=None, search_params:Optional[fais s.swigfaiss_avx2.SearchParameters]=None, distance_cutoff:Optional[float]=None)
FaissDataPlugin - data plugin for working with a faiss vector index
The data query will run k nearest neighbors against faiss_index
Optionally, item_data can be provided as a list of dicts, where item_data[i] corresponds to the data for embedding i in the faiss index
If item_data is provided item_data[i]['item_key'] defines the specific value for item i
search_params are optional kwargs sent to faiss.SearchParameters
if distance_cutoff is specified, query results with a distance greater than distance_cutoff are ignored
| Type | Default | Details | |
|---|---|---|---|
| k | int | k nearest neighbors to return | |
| faiss_index | Index | faiss index | |
| item_data | typing.Optional[typing.List[typing.Dict]] | None | Optional dict of item data |
| item_key | typing.Optional[str] | None | Optional key for item value (should be in item_data dict) |
| search_params | typing.Optional[faiss.swigfaiss_avx2.SearchParameters] | None | faiss search params |
| distance_cutoff | typing.Optional[float] | None | query to result distance cutoff |