Knowledge-Intensive LLM Applications

Retrieval-Augmented Generation (RAG) enables large language models to ulize external knowledge sources to improve their performance. Compared with fine-tuning, it is a more cost-effective approach to inject new knowledge. However, building a reliable RAG system is challenging due to the complexity of compound AI systems and the expensive cost of long context inference.

In this post, we will dive into RAG and explore recent research advances.

What is RAG?

RAG vs. Fine-Tuning

Main Challenges

Recent Advances

Open-source RAG engines

Hands-on Tutorials

Conclusion

References

Prompt Engineering Guide. "Retrieval Augmented Generation (RAG) for LLMs" Tech blog (2024).
Meta AI. "To fine-tune or not to fine-tune" Tech blog (2024).
Weaviate. "What is Agentic RAG" Tech blog (2024).
Zhang et al. "RAFT: Adapting Language Model to Domain Specific RAG" COLM 2024.
Asai et al. "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" ICLR 2024 (oral).
Amazon. "RAGChecker: A Fine-grained Framework For Diagnosing RAG" Github repo.
Infiniflow. "RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding" Github repo.
LangGenius. "Dify is an open-source LLM app development platform." Github repo.
LlamaIndex. "LlamaIndex is a data framework for your LLM applications" Github repo.
crewAI "Framework for orchestrating role-playing, autonomous AI agents." Github repo.
Stanford NLP. "DSPy: The framework for programming—not prompting—language models" GitHub repo.