constraint solving graduation using HTML Parsing, pandas, and z3

Question

not sure if this project fits on code review, but my code is getting extremely messy, and would love some tips to clean it up!

Overview

The project is designed to take in an HTML file (a degree audit), and a couple of course catalog datasets from my University. Then, I output a potential schedule.

Here is an example:

Input HTML report Input JSON report

Output:

{ "5": [ "CS 448 - 3 Cr / GPA: 0 (H)", "GWS 201 - 3 Cr / GPA: 3.5 (E)", "EALC 305 - 3 Cr / GPA: 3.33 (H)", "IS 204 - 3 Cr / GPA: 3.64 (E)", "MACS 140 - 3 Cr / GPA: 3.76 (E)", "FSHN 442 - 3 Cr / GPA: 3.7 (E)" ], "6": [ "AAS 100 - 3 Cr / GPA: 3.53 (E)", "CI 405 - 3 Cr / GPA: 3.72 (E)", "CSE 450 - 3 Cr / GPA: 0 (H)", "ANTH 350 - 3 Cr / GPA: 3.5 (E)", "ACE 100 - 4 Cr / GPA: 2.83 (H)" ], "7": [ "CS 421 - 3 Cr / GPA: 3.39 (H)", "ANSC 305 - 3 Cr / GPA: 3.69 (E)", "LLS 357 - 3 Cr / GPA: 3.68 (E)", "CS 444 - 3 Cr / GPA: 3.56 (E)", "KIN 387 - 3 Cr / GPA: 3.88 (E)", "NPRE 402 - 3 Cr / GPA: 3.51 (E)" ] }

Where each number represents the semester to take the course in.

How it works

I first parse the HTML into a JSON file / object. I then take the object, and my CSV datasets, and perform the following actions:

Cleanup input datasets
Initialize my two constraints sets: is_taking_class_constraints and semester_constraints
Add minimum / maximum hours per semester, and min/max hard classes per semester
Add taken courses
Add prerequisite constraints
A large block to parse all constraints from the JSON object

This generally adds the following types of constraints:

Take n courses from a list
Take n hours from alist
Fulfill at least n out of these k subrequirements

The course list is transformed from a format like department=None, number="4**" into a list of courses that meet the criteria. There is also special handling for other cases.

Parsing

from bs4 import BeautifulSoup from bs4.element import Tag import re SPECIAL_TOPICS = { 'CS': ['498'] } def get_req_needs(needs_row): # hours number + hourslabel, subreqs number + subreqslabel, count number, countlabel hours_needed = needs_row.find('span', class_='hours').text subreqs_needed = needs_row.find('td', class_='subreqs').text courses = needs_row.find('span', class_='count').text return { 'hours': float(hours_needed or 0), 'subreqs': int(subreqs_needed or 0), 'courses': int(courses or 0) } def get_subreq_needs(needs_row): hours_text = needs_row.find('td', class_='hours') hours_needed = hours_text.text if hours_text is not None else 0 courses_text = needs_row.find('td', class_='count') courses = courses_text.text if courses_text is not None else 0 # print(needs_row) courseslabel_text = needs_row.find('td', class_='countlabel') courseslabel = courseslabel_text.text.strip() if courseslabel_text is not None else '' if courseslabel == 'COURSES TAKEN' or courseslabel == 'COURSE TAKEN': courses = 0 return { 'hours': float(hours_needed), 'courses': int(courses) } def parse_course_select(from_list): courses = [] elements = [child for td in from_list.find_all('td') for child in td.children] for i, element in enumerate(elements): if type(element) is Tag: prev_join_text = elements[i-1].strip() if i > 0 else ',' next_join_text = elements[i+1].strip() if i < len(elements) - 1 else ',' next_element = elements[i+2] if i < len(elements) - 2 else None department = element['department'].strip() course_number = element['number'].strip().split(' ')[0] for special_topic in SPECIAL_TOPICS.get(department, []): if course_number.startswith(special_topic): # Add section in for special topics course_number = element.find('span', class_='number').text.replace('(X)', '').replace(department, '').strip() break next_course_number = None if next_element is None else next_element['number'].strip() # Handle wildcard Courses, e.g. take any course in this department if course_number == '****': courses.append([{ 'number': None, 'department': department }]) continue # Handle level courses, e.g. take any 300 level course in this department elif course_number.endswith('**'): department = None if department == '*****' else department courses.append([{ 'number': course_number, 'department': department }]) continue if i == len(elements) - 1 or next_join_text in [',', '', '&']: if prev_join_text != 'OR': courses.append([{ 'number': course_number, 'department': department }]) elif next_join_text == 'OR': courses.append([{ 'number': course_number, 'department': department, },{ 'number': next_course_number, 'department': next_element['department'].strip(), }]) elif next_join_text == 'TO': for course_number in range(int(course_number), int(next_course_number)): courses.append([{ 'number': str(course_number), 'department': department }]) else: raise ValueError('Invalid course join text: {}'.format(next_join_text)) return courses def parse_audit(html): audit = BeautifulSoup(html, 'html.parser') reqs_parsed = [] reqs = audit.find_all('div', class_='requirement') courses_taken_section = None for req in reqs: req_name = req.find('div', class_='reqTitle').get_text("\n").strip() req_OK = 'Status_OK' in req["class"] or 'Status_NONE' in req["class"] if 'summary of courses taken' in req_name.lower(): courses_taken_section = req continue if not req_OK and 'Status_NO' not in req["class"]: continue req_needs = None subreqs_parsed = [] if not req_OK: req_table = req.find('tr', class_='reqNeeds') # Parse subreqs subreqs = req.find_all('div', class_='subrequirement') prev_subreq_number = 0 for subreq in subreqs: # print(subreq) title = subreq.find('span', class_='subreqTitle') status_icon = subreq.find('span', class_='status')['class'] status_icon_none = 'Status_NONE' in status_icon status_icon_ok = 'Status_OK' in status_icon subreq_number = subreq.find('span', class_='subreqNumber').text.replace(')', '').strip() if subreq_number == 'OR': subreq_number = prev_subreq_number elif subreq_number == '': subreq_number = len(subreqs_parsed) + 1 else: subreq_number = int(subreq_number) if title is None: print(f'Warning: NO TITLE, using parent title ({req_name})') subreq_name = f'{req_name}' else: subreq_name = title.get_text("\n").strip() if len(subreq_name) == 0: print(f'Warning: EMPTY TITLE, using parent title ({req_name})') subreq_name = f'{req_name}' subreq_OK = False if status_icon_ok: subreq_OK = True elif title is None and req_OK: subreq_OK = True elif title is not None and ('srTitle_substatusOK' in title["class"] or 'srTitle_substatusIP' in title["class"]): subreq_OK = True # print(subreq_name, subreq_OK, title) # print(status_none) if title is not None and not subreq_OK and 'srTitle_substatusNO' not in title["class"]: print("Skipping subreq: {}".format(subreq_name)) continue if courses_taken_section is None and 'courses counting toward' in subreq_name.lower(): courses_taken_section = subreq from_list = None from_list = subreq.find('table', 'selectcourses') if from_list: from_list = from_list.find('td', class_='fromcourselist') courses = [] if not subreq_OK and from_list: courses = parse_course_select(from_list) # Parse general education subreqs by marking course_list as a string of the req code # See https://github.com/wadefagen/datasets/tree/master/geneds#data-format gened_lookup_table = { 'advanced composition': 'ACP', 'cultural studies': 'CS', 'humanities & the arts': 'HUM', 'natural sciences & technology': 'NAT', 'quantitative reasoning': 'QR', 'social & behavioral science': 'SBS', 'liberal education': 'LIB', } for key in gened_lookup_table: if key in subreq_name.lower() or key.replace('&', 'and') in subreq_name.lower(): assert len(courses) == 0 courses = [[{ 'department': 'GENED', 'number': gened_lookup_table[key] }]] break # This manually fixes up some major requirements where no course list is specified # This is for subreqs major_subrequirement_lookup_table = { 'all technical electives': [{ 'department': 'CS', 'number': '4**' }], 'total earned hours': [{ 'department': None, 'number': None }], # https://las.illinois.edu/academics/requirements/minimum 'advanced hours completed on this campus': [[{ 'department': None, 'number': '4**', }], [{ 'department': None, 'number': '5**'}]] } for key in major_subrequirement_lookup_table: if key in subreq_name.lower() : assert len(courses) == 0 if len(major_subrequirement_lookup_table[key]) == 1: courses = [major_subrequirement_lookup_table[key]] else: courses = major_subrequirement_lookup_table[key] break subreq_needs = None table = subreq.find('table', class_='subreqNeeds') if table is None: subreq_needs = { 'hours': 0, 'courses': len(courses) } else: subreq_needs = get_subreq_needs(table) subreq_needs['course_list'] = courses is_none = subreq_needs['hours'] == 0 and subreq_needs['courses'] == 0 and status_icon_none subreqs_parsed.append({ 'name': subreq_name, 'subreq_number': subreq_number, 'OK': subreq_OK or is_none, 'needs': subreq_needs }) prev_subreq_number = subreq_number if req_table is None: req_needs = { 'hours': 0, 'subreqs': len(list(filter(lambda x: x['OK'] == False, subreqs_parsed))), 'courses': 0 } else: req_needs = get_req_needs(req_table) # Manually fixup some requirements by adding a fake prereq with a course list major_requirement_lookup_table = { 'minimum of': {"name": "All Courses", "subreq_number": 1, "OK": True, "needs": { "hours": 0.0, "courses": 0, "course_list": [ [ { "department": None, "number": None } ] ]}} } for key in major_requirement_lookup_table: if key in req_name.lower(): # sum all the hours, courses in the subreqs total = sum([subreq['needs']['hours'] + subreq['needs']['courses'] for subreq in subreqs_parsed]) if total == 0: subreqs_parsed = [major_requirement_lookup_table[key]] else: print(f'warning-- not adding fake subreq for {req_name} because total is not 0') break reqs_parsed.append({ 'name': req_name, 'req_number': len(reqs_parsed) + 1, 'OK': req_OK, 'needs': req_needs, 'subreqs': subreqs_parsed }) # Parse classes taken courses_taken = [] assert courses_taken_section is not None courses = courses_taken_section.find_all('tr', class_='takenCourse') for course in courses: raw_term = course.find('td', class_='term').text.strip() term, year = raw_term[:2], int(raw_term[2:]) raw_name = course.find('td', class_='course').text.strip() grade = course.find('td', class_='grade').text.strip() condition_code = course.find('td', class_='ccode').text.strip() # parse out name if condition_code == '>D': # Duplicated Course, ignore continue SPECIAL_TOPICS = { 'CS': ['498'] } match = re.match(r'([A-Z]{2,5})\s+([\d-]{3})(\s+[A-Z1-9]{1,3})?', raw_name) if match: department = match.group(1) number = match.group(2) section = match.group(3).strip() if match.groups == 3 else '' if number.endswith('--'): number = number[:-2] + '**' for topic in SPECIAL_TOPICS.get(department, []): if number == topic: number = f'{number} {section}' courses_taken.append({ # We count summer classes as spring classes 'term': 'Spring' if term == 'SP' or term == 'SU' else 'Fall', 'year': 2000 + year - (1 if term == 'WI' else 0), 'department': department, 'number': number, 'is_transfer': grade == 'TR', }) # Add semester int to courses taken first_year, first_term = get_first_term({'courses_taken': courses_taken}) semester_offset = 1 if first_term == 'Spring' else 0 for course in courses_taken: # Mark courses transferred into university before you took uni classes as 0 if course['is_transfer'] and course['year'] < first_year or \ (course['year'] == first_year and course['term'] == 'Fall' and first_term == 'Spring'): course['semester'] = 0 continue course['semester'] = (course['year'] - first_year) * 2 + semester_offset + (0 if course['term'] == 'Spring' else 1) return { 'requirements': reqs_parsed, 'courses_taken': courses_taken } def get_last_term(audit_result): last_term = None last_year = 0 last_semester = -1 for course in audit_result['courses_taken']: if course['semester'] > last_semester: last_term = course['term'] last_year = course['year'] last_semester = course['semester'] return last_year, last_term, last_semester def get_first_term(audit_result): first_term = None first_year = 9999 for course in audit_result['courses_taken']: if (course['year'] < first_year or (course['year'] == first_year and course['term'] == 'Fall')) and not course['is_transfer']: first_term = course['term'] first_year = course['year'] return first_year, first_term

Main Code

import lib from pathlib import Path import pandas as pd import json from dataclasses import dataclass from z3 import * import re import math import sys @dataclass class Config: min_hours: int = 12 max_hours: int = 18 gpa_hard_threshold: float = 3.5 max_hard_classes: int = 2 max_semesters: int = 7 SPECIAL_TOPICS = { 'ADV': [400, 490], 'ANTH': [499], 'ARTD': [499], 'ARTS': [445], 'CHLH': [494], 'CI': [499], 'CMN': [396, 496], 'CPSC': [499], 'CS': [498], 'DANC': [451], 'ECE': [498], 'ENGL': [396, 461, 475], 'EPSY': [490, 590], 'FIN': [490], 'GLBL': [499], 'INFO': [390, 490], 'IS': [390, 490, 496, 497], 'JOUR': [460, 480], 'KIN': [494], 'LING': [490], 'MACS': [395, 496], 'MCB': [493], 'MSE': [498], 'MUS': [404, 499], 'NPRE': [498], 'PHIL': [380], 'PS': [300], 'PSYC': [496], 'SOC': [396, 496], 'SOCW': [380], 'TE': [398] } def GetModel(config, sat): print('Solving...') if sat.check() == unsat: print('Unsatisfiable') exit() else: print('Satisfiable') m = sat.model() lookup = {} for i in range(config.max_semesters + 1): lookup[i] = [] taking = [] for item in m: if item.name().startswith('taking_') and m[item] == True: course = item.name().split('_')[1] taking.append(course) n = 0 for item in m: if item.name().startswith('semester_') and m[item] == True: course = item.name().split('_')[2] n += 1 semester = int(item.name().split('_')[1]) if course in taking: lookup[semester].append(f'{course} - {hour_values[course]} Cr / GPA: {round(gpas.get(course, 0),2)} ({"H" if is_class_hard.get(course, True) else "E"})') return lookup def generate_prereq_constraint(last_semester, prereq, f_name): offset = 1 if prereq['is_concurrent'] else 0 constraints = [] # If the prereq is not in the catalog it's impossible to take if prereq['course'] not in is_taking_class_constraints: return None # For each possible semester to take the class for i in range(last_semester+1, config.max_semesters+1): # If the prereq is not in the catalog, skip it # Ensure that the prereq is taken before the class for each combination prereq_semesters = [semester_constraints[prereq['course']][0] == True] + \ [semester_constraints[prereq['course']][j] == True for j in range(last_semester+1, i+offset)] if len(prereq_semesters) == 1: # TODO: possible optimization (check if we can state that this prereq is impossible to take like this) pass constraint = And( semester_constraints[f_name][i] == True, Or(prereq_semesters) ) constraints.append(constraint) return Or( semester_constraints[f_name][0] == True, # If the class is already taken, don't worry about prereqs And( is_taking_class_constraints[prereq['course']] == True, Or(constraints) ) ) if __name__ == '__main__': config = Config() if len(sys.argv) < 2: print('Usage: python run.py <audit_file>') sys.exit(1) audit_contents = open(sys.argv[1], 'rb').read() audit = lib.parse_audit(audit_contents) last_year, last_term, last_semester = lib.get_last_term(audit) print(last_semester) # print(json.dumps(audit, indent=2)) # remaining = lib.get_remaining_requirements(audit) requirements = audit['requirements'] sat = Solver() PREREQ_PATH = Path('..', 'datasets', 'uiuc-prerequisites.json') GPA_PATH = Path('..', 'datasets', 'uiuc-gpa-dataset.csv') GENED_PATH = Path('..', 'datasets', 'gened-courses.csv') CATALOG_PATH = Path('..', 'datasets', 'course-catalog-cleaned.csv') LIBERAL_PATH = Path('..', 'datasets', 'liberal_education.csv') prereqs = json.load(open(PREREQ_PATH, 'r', encoding='utf-8')) gpa = pd.read_csv(GPA_PATH) gened = pd.read_csv(GENED_PATH) catalog = pd.read_csv(CATALOG_PATH) liberal = pd.read_csv(LIBERAL_PATH) # deduplicate catalog # TODO handle multiple semesters catalog = catalog.drop_duplicates(subset=['Subject', 'Number', 'Name', 'Description']) # 'Year', 'Term', catalog = catalog[catalog.apply(lambda x: int(x['Number'].split(' ')[0]) < 500, axis=1)] # Drop 500 level courses (b/c of special topics they could be like 498 ASU) # Collapse all sections of the same course into one, summing counts of each grade gpa = gpa.groupby(['Subject', 'Number']).sum(numeric_only=True).reset_index() # drop rows older than 2019 gpa = gpa[gpa['Year'] >= 2019] # Add GPA column # A+ A A- B+ B B- C+ C C- D+ D D- F mapping = { 'A+': 4.0, 'A': 4.0, 'A-': 3.67, 'B+': 3.33, 'B': 3.0, 'B-': 2.67, 'C+': 2.33, 'C': 2.0, 'C-': 1.67, 'D+': 1.33, 'D': 1.0, 'D-': 0.67, 'F': 0.0, } # Each column has counts of each grade, calculate the weighted average gpa['GPA'] = gpa.apply(lambda row: sum([mapping[grade] * row[grade] for grade in mapping]) / sum([row[grade] for grade in mapping]), axis=1) # temporary - TODO more robust way to handle this is_class_hard = {} gpas = {} for index, row in gpa.iterrows(): f_name = f'{row["Subject"]} {row["Number"]}' gpas[f_name] = row['GPA'] if row['GPA'] < config.gpa_hard_threshold: is_class_hard[f_name] = True else: is_class_hard[f_name] = False # exit() # Course Name -> an int variable representing semester semester_constraints = {} is_taking_class_constraints = {} hour_values = {} # Add catalog course constraints seen = set() for index, row in catalog.iterrows(): f_name = f'{row["Subject"]} {row["Number"]}' if f_name in seen: # raise Exception(f'Found duplicate course {f_name}') continue else: seen.add(f_name) credit_hours = row['Credit Hours'].replace('hours.', '') # https://stackoverflow.com/questions/45001775/find-all-floats-or-ints-in-a-given-string potential_hour_values = [int(float(h)) for h in re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", credit_hours)] if 'TO' in credit_hours: potential_hour_values = [x for x in range(potential_hour_values[0], potential_hour_values[1]+1)] # temporarily only allow 3 hour classes (and prefer minimum course hours) if min(potential_hour_values) < 3: continue hour_values[f_name] = min(filter(lambda x: x >= 3, potential_hour_values)) for i in range(last_semester+1, config.max_semesters+1): if f_name not in semester_constraints: semester_constraints[f_name] = {} semester_constraints[f_name][i] = Bool(f'semester_{i}_{f_name}') # This is for already taken courses semester_constraints[f_name][0] = Bool(f'semester_0_{f_name}') # Must take the class exactly once if it's a requirement is_taking_class_constraints[f_name] = Bool(f'taking_{f_name}') req_constraint = PbEq([(value, 1) for (key, value) in semester_constraints[f_name].items()], 1) sat.add(If(is_taking_class_constraints[f_name], req_constraint, True)) # TODO: is there a better way to implement this? sat.add(If(req_constraint, is_taking_class_constraints[f_name], True)) # Add semester hour constraints for i in range(last_semester+1, config.max_semesters+1): hour_constraint_low = PbGe([(And(semester_bools[i], is_taking_class_constraints[f_name] == True), hour_values[f_name]) for (f_name, semester_bools) in semester_constraints.items()], config.min_hours) sat.add(hour_constraint_low) hour_constraint_high = PbLe([(And(semester_bools[i], is_taking_class_constraints[f_name] == True), hour_values[f_name]) for (f_name, semester_bools) in semester_constraints.items()], config.max_hours) sat.add(hour_constraint_high) # Add semester hardness constraints for i in range(last_semester+1, config.max_semesters+1): constraint = PbLe([ (And(semester_bools[i], is_taking_class_constraints[f_name] == True), 1) for (f_name, semester_bools) in semester_constraints.items() if is_class_hard.get(f_name, True) ], config.max_hard_classes) sat.add(constraint) # Add already taken course constraints already_taken = [] taken_courses = [f'{course["department"]} {course["number"]}' for course in audit['courses_taken']] for f_name in semester_constraints: if f_name in taken_courses: sat.add(semester_constraints[f_name][0] == True) for i in range(last_semester+1, config.max_semesters+1): sat.add(semester_constraints[f_name][i] == False) sat.add(is_taking_class_constraints[f_name] == True) else: sat.add(semester_constraints[f_name][0] == False) # Add prereq constraints for index, row in catalog.iterrows(): f_name = f'{row["Subject"]} {row["Number"]}' if f_name in prereqs and f_name in is_taking_class_constraints: and_clauses = [] # Each section needs to be satisfied for section in prereqs[f_name]: or_clause = [] # Each section only needs one of the prereqs for prereq in section: # All ways to satisfy the prereq clause = generate_prereq_constraint(last_semester, prereq, f_name) if clause is None: # If we can't satisfy the prereq, don't add continue or_clause.append(clause) # TODO: If we can't satisfy any of the prereqs, don't add and_clauses.append(Or(or_clause)) constraint = And(and_clauses) if len(and_clauses) > 0 else True # If we are taking the class, we need to satisfy all prereqs # Taking the class implies satisfying all prereqs if f_name not in taken_courses: sat.add(If(is_taking_class_constraints[f_name] == True, constraint, True)) # Add degree requirements for requirement in requirements: if requirement['OK'] == True: continue req_needs = requirement['needs'] n_subreqs_needed = req_needs['subreqs'] n_courses_needed = req_needs['courses'] n_hours_needed = math.ceil(req_needs['hours']) subreq_constraints = {} subreq_names = [] req_course_list = [] for subreq in requirement['subreqs']: # print(subreq['name']) needs = subreq['needs'] subreq_n_needed = needs['courses'] subreq_n_hours_needed = math.ceil(needs['hours']) course_list_expr = [] subreq_course_list = [] for or_courses in needs['course_list']: or_clause = [] for course in or_courses: number, department = course['number'], course['department'] course_name = f'{course["department"]} {course["number"]}' # Constraint: GENED / Liberal Ed requirements # This should NOT be a part of an OR clause, as we could take multiple if department == 'GENED': assert len(or_courses) == 1 if number == 'LIB': for index, row in liberal.iterrows(): f_name = f'{row["Subject"]} {row["Number"]}' if f_name in is_taking_class_constraints: subreq_course_list.append(f_name) course_list_expr.append(is_taking_class_constraints[f_name] == True) else: # Get gened courses with the column named number that isn't None valid_courses = gened.loc[gened[number].notna()] for index, row in valid_courses.iterrows(): f_name = row["Course"] if row["Course"] in is_taking_class_constraints: subreq_course_list.append(f_name) course_list_expr.append(is_taking_class_constraints[f_name] == True) # Constraint: Take any course in a department # This should NOT be a part of an OR clause, as we could take multiple elif number is None: assert len(or_courses) == 1 # This should be the only course in the OR clause if department is None: # Literally any class for index, row in catalog.iterrows(): f_name = f'{row["Subject"]} {row["Number"]}' if f_name in is_taking_class_constraints: subreq_course_list.append(f_name) course_list_expr.append(is_taking_class_constraints[f_name] == True) else: valid_courses = catalog.loc[catalog['Subject'] == department] for index, row in valid_courses.iterrows(): f_name = f'{row["Subject"]} {row["Number"]}' if f_name in is_taking_class_constraints: subreq_course_list.append(f_name) course_list_expr.append(is_taking_class_constraints[f_name] == True) # Constraint: Standard Course elif number.split(' ')[0].isdigit() and department is not None: if course_name in is_taking_class_constraints: subreq_course_list.append(course_name) or_clause.append(course_name) # Constraint: Any n-level course elif number.endswith('**'): assert len(or_courses) == 1 # This should be the only course in the OR clause level = int(number[0]) * 100 if department is not None: # This is disgusting valid_courses = catalog[catalog.apply(lambda x: x['Subject'] == department and int(x['Number'].split(' ')[0]) >= level and int(x['Number'].split(' ')[0]) < level + 100, axis=1)] else: valid_courses = catalog[catalog.apply(lambda x: int(x['Number'].split(' ')[0]) >= level and int(x['Number'].split(' ')[0]) < level + 100, axis=1)] # This should NOT be a part of an OR clause, as we could take multiple for index, row in valid_courses.iterrows(): f_name = f'{row["Subject"]} {row["Number"]}' if f_name in is_taking_class_constraints: subreq_course_list.append(f_name) course_list_expr.append(is_taking_class_constraints[f_name] == True) else: raise Exception(f'Unknown course constraint: {course}') # Filter out bad courses or_clause = [course for course in or_clause if course in is_taking_class_constraints] # Don't add an empty or clause if len(or_clause) == 0: continue # We must take at least one of the courses in the list to satisfy 1 course list of the subreq # TODO add to req_course_list course_list_expr.append(Or([is_taking_class_constraints[course] == True for course in or_clause])) # Express that we need to take at least n choose k courses in the future (after last semester) # https://stackoverflow.com/questions/43081929/k-out-of-n-constraint-in-z3py if len(course_list_expr) == 0: if subreq['OK'] == False: print(f'**WARNING** No valid courses for {subreq["name"]} -- must be manually inputted') # DEFAULT TO ALL CLASSES (HACK) for index, row in catalog.iterrows(): f_name = f'{row["Subject"]} {row["Number"]}' if f_name in is_taking_class_constraints: subreq_course_list.append(f_name) course_list_expr.append(is_taking_class_constraints[f_name] == True) else: continue # We must satisfy at least n course lists to satisfy the subreq # print(course_list_expr) subreq_ncourses_constraint = PbGe([(course_expr, 1) for course_expr in course_list_expr], subreq_n_needed) if subreq_n_hours_needed > 0: sat.add(PbGe([ (And(is_taking_class_constraints[course] == True, semester_constraints[course][0] == False), hour_values[course]) for course in subreq_course_list ], subreq_n_hours_needed) == True ) # print(n_needed, subreq_constraint) subreq_names.append(subreq['name']) if subreq_n_needed > 0 and subreq['subreq_number'] not in subreq_constraints: subreq_constraints[subreq['subreq_number']] = [] if subreq_n_needed > 0: subreq_constraints[subreq['subreq_number']].append(subreq_ncourses_constraint) req_course_list.extend(subreq_course_list) # print(req_course_list, n_hours_needed, requirement['name']) # Add req course hours if n_hours_needed > 0: sat.add(PbGe([ (And(is_taking_class_constraints[course] == True, semester_constraints[course][0] == False), hour_values[course]) for course in req_course_list ], n_hours_needed) == True) if len(subreq_constraints) > 0 and n_subreqs_needed > 0: # Group by subreq_number # We must satisfy at least n subreqs to satisfy the req req_constraint = PbGe([ (Or(subreq_constraint_list) == True, 1) for (i, subreq_constraint_list) in subreq_constraints.items() ], n_subreqs_needed) sat.add(req_constraint == True) else: pass model = GetModel(config, sat) print(json.dumps(model, indent=2))

The main feedback I want is how to segment out this code into functions. Additionally, it would be nice to have some ideas for a rewrite where I don't convert to a JSON object, and then convert back, but instead use Python classes/some other method.

There's several input files missing to run it, would you be able to provide them too? Otherwise feedback will be a bit lacking since we can't run it (the XYZ_PATH ones). — ferada, CommentedFeb 24, 2023 at 14:46

Reinderien · Accepted Answer · 2025-03-25 22:11:42Z

The main feedback I want is how to segment out this code into functions

You've done this quite successfully other than parse_audit which needs to be further broken up. As a vague starting point, any time that you have a block of code starting with # Parse classes taken, try writing e.g. def parse_classes_taken(courses_taken_section).

This is a lot of code, so I'll only pick at it in bits and pieces.

def get_req_needs(needs_row): should probably be hinted as def get_req_needs(needs_row: Tag) -> dict[str, float | int]:. However, it's not a great idea to pass around data in dictionaries. Instead consider classes, especially immutable classes.

You have two different coalescing styles: hours_needed or 0 and hours_text.text if hours_text is not None else 0. The second one is probably safer as it's more explicit and doesn't rely on truthy/falsy semantics.

I don't see a strainer used in your call to BeautifulSoup(); you should use one. See Parsing only part of a document.

Your __main__ block is way too long. It needs to be put into several functions.

Stack Exchange Network

constraint solving graduation using HTML Parsing, pandas, and z3

Overview

How it works

Parsing

Main Code

1 Answer 1

Hot Network Questions

constraint solving graduation using HTML Parsing, pandas, and z3

Overview

How it works

Parsing

Main Code

1 Answer 1

Related

Hot Network Questions