3
\$\begingroup\$

Context: I'm a complete newbie trying to get good at structuring my code. I'm aware that I'm supposed to do my own research with docs, but I don't know what specific docs I should be reading for this specific use case.

Question: What kind of improvements can I make to this Python code snippet?

Context about code: Uses .txt files for RAG and prints out response in terminal. LM Studio compatible.

import os from dotenv import load_dotenv from sentence_transformers import SentenceTransformer import faiss import numpy as np import requests import torch # Envs def load_config(): load_dotenv() return { "api_url": os.getenv("API_URL"), "api_key": os.getenv("API_KEY"), "model": os.getenv("LLM_MODEL", "deepseek-r1-distill-qwen-7b"), "embedding_model": os.getenv("EMBEDDING_MODEL", "all-MiniLM-L6-v2"), } def chunk_text(text, chunk_size=500, overlap=50): tokens = text.split() return [ " ".join(tokens[i: i + chunk_size]) for i in range(0, len(tokens), chunk_size - overlap) ] def build_embeddings(chunks, config): device = 'cuda' if torch.cuda.is_available() else 'cpu' embedder = SentenceTransformer(config["embedding_model"], device=device) return embedder.encode(chunks, batch_size=32, show_progress_bar=True) def build_faiss_index(embeddings): dim = embeddings.shape[1] quantizer = faiss.IndexFlatL2(dim) index = faiss.IndexIVFFlat(quantizer, dim, min(50, len(embeddings))) if len(embeddings) > 0: index.train(np.array(embeddings).astype("float32")) index.add(np.array(embeddings).astype("float32")) return index def save_index(index, filepath="index/index.faiss"): os.makedirs(os.path.dirname(filepath), exist_ok=True) faiss.write_index(index, filepath) def load_index(dim, filepath="index/index.faiss"): if os.path.exists(filepath): return faiss.read_index(filepath, faiss.IO_FLAG_MMAP) else: index = faiss.IndexFlatL2(dim) save_index(index, filepath) return index def retrieve_chunks(question, chunks, embedder, index, top_k =3): q_emb = embedder.encode([question]) _, idxs = index.search(np.array(q_emb).astype("float32"), top_k) return [chunks[i] for i in idxs[0]] def ask_with_context(question, config, chunks, embedder, index): try: snippets = retrieve_chunks(question, chunks, embedder, index) prompt = ( "Use the following context to answer the question in a clear, concise, and friendly way.\n\n" + "\n\n".join(snippets) + f"\n\nQuestion: {question}\nAnswer:" ) headers = { "Authorization": f"Bearer {config['api_key']}", "Content-Type": "application/json", } payload = { "model" : config["model"], "messages": [{"role": "user", "content": prompt}], "max_tokens": 4096, # Max tokens "temperature": 0.2, # Temperature } r = requests.post(config["api_url"], headers=headers, json=payload) r.raise_for_status() return r.json()["choices"][0]["message"]["content"] except requests.exceptions.RequestException as e: print(f"API Request Error: {e}") return "Sorry, I couldn't get a response from the language model." except Exception as e: print(f"Error: {e}") return "An error occurred while processing your Question." if __name__ == "__main__": cfg = load_config() with open("data/message.txt", "r", encoding="utf-8") as f: text = f.read() chunks = chunk_text(text) embeddings = build_embeddings(chunks, cfg) index = build_faiss_index(embeddings) embedder = SentenceTransformer(cfg["embedding_model"]) print("Ready! Ask your questions (type 'exit' to quit).") while True: q = input("Q: ") if q.lower() in ("exit", "quit"): break ans = ask_with_context(q, cfg, chunks, embedder, index) ans = ans.replace("<think>", "").replace("</think>", "").strip() print("A:", ans) 

Question2: Where can I learn how to get better at this? I'm a high school student, and I don't have access to any paid courses yet, but if there are any book suggestions, that would be greatly appreciated.

New contributor
Shades is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
\$\endgroup\$
1
  • \$\begingroup\$The current title of your question is too generic to be helpful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How do I ask a good question?.\$\endgroup\$
    – BCdotWEB
    Commented11 hours ago

1 Answer 1

2
\$\begingroup\$

Documentation

What stands out most about the code is the lack of documentation. Since you are new to Python and/or coding, the PEP 8 style guide recommends adding docstrings for functions and for summarizing the purpose of the code.

For example, you could add something like this at the top of your code:

""" Uses text files for RAG and prints out response in terminal. LM Studio compatible. """ 

You should explain what "RAG" and "LM Studio" are, and yo should describe what the format of the text files are.

Since the code expects some environment variables to be set, you can explain that as well.

The function docstrings should describe their input variable types and their return types.

Here are some other minor style suggestions.

Simpler

This line:

if len(embeddings) > 0: 

is simpler as:

if len(embeddings): 

There is no need to compare against 0.

Comments

These comments are not needed and should be removed:

"max_tokens": 4096, # Max tokens "temperature": 0.2, # Temperature 

Those comments are redundant with the code. Comments should be used to elaborate on the code, when you feel it is necessary.

\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.