The \"open models\" available online are still largely closed-source; the matrices are basically binary blocks that describe the weights the model assigns to each tensor.
Basically, before sending the prompt to the LLM, the client does a search to find additional context. There are lots of tools for doing this, but the most popular seem to be from the AI community, and work by converting the user input to a 'vector' of NLP tokens, using a specialized 'vector database' to find other 'chunks' of related inputs, then add those to the message before sending it to the LLM
A super powerful capability, from what I can tell, developers generally implement this by telling the LLM how to structure its output to make tool calls, then attempting to parse the LLMs output to detect tool calls, run the tools, and append the result to the message going into the LLM.