Expanding Jori's Knowledge Base

The system described utilizes a multi-layered Retrieval-Augmented Generation (RAG) approach for processing Fast Healthcare Interoperability Resources (FHIR) documents. This complex pipeline integrates various components from the langchain library and external services like Pinecone for vector storage and OpenAI for embeddings and language model-based processing, specifically tailored for health analytics questions.

The problem space in health analytics is vast and intricate, encompassing a wide array of data points from patient records to clinical studies. To navigate this space, the Jori—employs a multi-layered memory recall system that ensures efficiency and accuracy in retrieving information relevant to the query at hand. Here’s a breakdown of how Joris operates mathematically and iteratively:

  1. Vector Space Modeling and Cosine Similarity: Joris represents documents and queries as vectors in a high-dimensional space. Using cosine similarity, it measures the closeness between the query vector and document vectors. This method effectively filters the initial set of documents, aligning closely with the query's context.

  2. Layered Caching Mechanism: With its layered cache, Joris can reference a variety of pre-processed information. This includes direct matches from past queries (L1), similar queries via cosine similarity (L2), combinations of related queries (L3), and even enhanced responses through reinforcement learning (L5). Each layer serves as a sieve, progressively refining the selection of relevant documents and information snippets.

  3. MapReduce Paradigm: The MapReduce function in Joris is a two-step process:

    • Map: Each document is analyzed individually to extract key themes and information. This is akin to decomposing a complex problem into smaller, manageable sub-problems.

    • Reduce: These sub-results are then combined to form a coherent summary or answer. This step resembles the integration of partial solutions to form a comprehensive answer to the original problem.

  4. Iterative Learning: Through reinforcement learning (L5), Joris can iteratively improve its responses. Each iteration involves exploration (trying new responses) and exploitation (using the best-known response), with a reward mechanism providing feedback. This process is similar to a mathematical optimization problem where the goal is to find the maximum or minimum of a function.

  5. Continuous Improvement: By running in a loop, Joris continuously refines its policy (the strategy for selecting responses), optimizing the answer selection over time. This is comparable to a converging sequence in mathematics, where iteration brings the sequence closer to the limit.

By running Joris in a loop on a variety of health analytics questions, it effectively creates a comprehensive memory bank. This bank grows and evolves, allowing the system to handle an expanding range of queries. The memory bank is not static; it's dynamic and self-optimizing, ensuring that the AI remains relevant and accurate as new data and questions emerge.

In essence, Joris's looped operation on diverse health queries is equivalent to a mathematical function being applied across a domain. With each iteration, the function (Joris) becomes more refined, effectively expanding its range and the complexity of problems it can solve. This iterative approach, coupled with a layered memory system, allows Joris to build sufficient 'memories' or data points to address a wide spectrum of problems within the health analytics domain.

Reinforcement Learning with Human Feedback (RLHF)

If Jori fails to learn new skills due to a lack of reference knowledge in the LLM's initial training, a human can submit code + query to Jori's long-term memory. This will mean when similar queries are asked in the future, instead of the LLM coming up with code from scratch, It will be picked up in the L2 and L3 Cache and use the previous code as a starting point. This enables Jori to surpass roadblocks by attempting to brute-force code with the LLM.

Last updated