-
Hi! I'm using r2r, and I’d like to know if it's possible (and how) to enable search over metadata in addition to the chunks of different documents. Specifically: Is there a way to include metadata (e.g., title, author, category, etc.) in the retrieval process? Is it possible to integrate it with a simple semantic search approach? When enabling the agent I saw there is the possibility to search over metadata, but what about searching without the agent? My goal is to build a pipeline where: Indexed documents include meaningful metadata. The user query is matched against both content and metadata. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hey @Ayoubimad hopefully the examples below illustrate how to do this. They are taken from the API/SDK documentation here. Let me know if this doesn't achieve what you're trying to do. from r2r import R2RClient
client = R2RClient()
# if using auth, do client.login(...)
# Basic search
response = client.retrieval.search(
query="What is DeepSeek R1?",
)
# Advanced mode with specific filters
response = client.retrieval.search(
query="What is DeepSeek R1?",
search_mode="advanced",
search_settings={
"filters": {"document_id": {"$eq": "e43864f5-a36f-548e-aacd-6f8d48b30c7f"}},
"limit": 5
}
)
# Using hybrid search
response = client.retrieval.search(
query="What was Uber's profit in 2020?",
search_settings={
"use_hybrid_search": True,
"hybrid_settings": {
"full_text_weight": 1.0,
"semantic_weight": 5.0,
"full_text_limit": 200,
"rrf_k": 50
},
"filters": {"title": {"$in": ["DeepSeek_R1.pdf"]}},
}
)
# Advanced filtering
results = client.retrieval.search(
query="What are the effects of climate change?",
search_settings={
"filters": {
"$and":[
{"document_type": {"$eq": "pdf"}},
{"metadata.year": {"$gt": 2020}}
]
},
"limit": 10
}
)
# Knowledge graph enhanced search
results = client.retrieval.search(
query="What was DeepSeek R1",
) |
Beta Was this translation helpful? Give feedback.
Hey @Ayoubimad hopefully the examples below illustrate how to do this. They are taken from the API/SDK documentation here. Let me know if this doesn't achieve what you're trying to do.