How do I structure this file better?

Question

Context: I'm a complete newbie trying to get good at structuring my code. I'm aware that I'm supposed to do my own research with docs, but I don't know what specific docs I should be reading for this specific use case.

Question: What kind of improvements can I make to this Python code snippet?

Context about code: Uses .txt files for RAG and prints out response in terminal. LM Studio compatible.

import os from dotenv import load_dotenv from sentence_transformers import SentenceTransformer import faiss import numpy as np import requests import torch # Envs def load_config(): load_dotenv() return { "api_url": os.getenv("API_URL"), "api_key": os.getenv("API_KEY"), "model": os.getenv("LLM_MODEL", "deepseek-r1-distill-qwen-7b"), "embedding_model": os.getenv("EMBEDDING_MODEL", "all-MiniLM-L6-v2"), } def chunk_text(text, chunk_size=500, overlap=50): tokens = text.split() return [ " ".join(tokens[i: i + chunk_size]) for i in range(0, len(tokens), chunk_size - overlap) ] def build_embeddings(chunks, config): device = 'cuda' if torch.cuda.is_available() else 'cpu' embedder = SentenceTransformer(config["embedding_model"], device=device) return embedder.encode(chunks, batch_size=32, show_progress_bar=True) def build_faiss_index(embeddings): dim = embeddings.shape[1] quantizer = faiss.IndexFlatL2(dim) index = faiss.IndexIVFFlat(quantizer, dim, min(50, len(embeddings))) if len(embeddings) > 0: index.train(np.array(embeddings).astype("float32")) index.add(np.array(embeddings).astype("float32")) return index def save_index(index, filepath="index/index.faiss"): os.makedirs(os.path.dirname(filepath), exist_ok=True) faiss.write_index(index, filepath) def load_index(dim, filepath="index/index.faiss"): if os.path.exists(filepath): return faiss.read_index(filepath, faiss.IO_FLAG_MMAP) else: index = faiss.IndexFlatL2(dim) save_index(index, filepath) return index def retrieve_chunks(question, chunks, embedder, index, top_k =3): q_emb = embedder.encode([question]) _, idxs = index.search(np.array(q_emb).astype("float32"), top_k) return [chunks[i] for i in idxs[0]] def ask_with_context(question, config, chunks, embedder, index): try: snippets = retrieve_chunks(question, chunks, embedder, index) prompt = ( "Use the following context to answer the question in a clear, concise, and friendly way.\n\n" + "\n\n".join(snippets) + f"\n\nQuestion: {question}\nAnswer:" ) headers = { "Authorization": f"Bearer {config['api_key']}", "Content-Type": "application/json", } payload = { "model" : config["model"], "messages": [{"role": "user", "content": prompt}], "max_tokens": 4096, # Max tokens "temperature": 0.2, # Temperature } r = requests.post(config["api_url"], headers=headers, json=payload) r.raise_for_status() return r.json()["choices"][0]["message"]["content"] except requests.exceptions.RequestException as e: print(f"API Request Error: {e}") return "Sorry, I couldn't get a response from the language model." except Exception as e: print(f"Error: {e}") return "An error occurred while processing your Question." if __name__ == "__main__": cfg = load_config() with open("data/message.txt", "r", encoding="utf-8") as f: text = f.read() chunks = chunk_text(text) embeddings = build_embeddings(chunks, cfg) index = build_faiss_index(embeddings) embedder = SentenceTransformer(cfg["embedding_model"]) print("Ready! Ask your questions (type 'exit' to quit).") while True: q = input("Q: ") if q.lower() in ("exit", "quit"): break ans = ask_with_context(q, cfg, chunks, embedder, index) ans = ans.replace("<think>", "").replace("</think>", "").strip() print("A:", ans)

Question2: Where can I learn how to get better at this? I'm a high school student, and I don't have access to any paid courses yet, but if there are any book suggestions, that would be greatly appreciated.

The current title of your question is too generic to be helpful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How do I ask a good question?. — BCdotWEB, Commented11 hours ago

toolic · Accepted Answer · 2025-04-28 00:33:38Z

Documentation

What stands out most about the code is the lack of documentation. Since you are new to Python and/or coding, the PEP 8 style guide recommends adding docstrings for functions and for summarizing the purpose of the code.

For example, you could add something like this at the top of your code:

""" Uses text files for RAG and prints out response in terminal. LM Studio compatible. """

You should explain what "RAG" and "LM Studio" are, and yo should describe what the format of the text files are.

Since the code expects some environment variables to be set, you can explain that as well.

The function docstrings should describe their input variable types and their return types.

Here are some other minor style suggestions.

Simpler

This line:

if len(embeddings) > 0:

is simpler as:

if len(embeddings):

There is no need to compare against 0.

Comments

These comments are not needed and should be removed:

"max_tokens": 4096, # Max tokens "temperature": 0.2, # Temperature

Those comments are redundant with the code. Comments should be used to elaborate on the code, when you feel it is necessary.

Stack Exchange Network

How do I structure this file better?

1 Answer 1

Documentation

Simpler

Comments

Hot Network Questions

How do I structure this file better?

1 Answer 1

Documentation

Simpler

Comments

Related

Hot Network Questions