Class Routine Transformer¶
- class routinepy.lib.scraper.transformers.class_routine.ClassRoutineTableTransformer[source]¶
Transforms raw HTML class routine tables into structured routine data.
This class handles the conversion of class routine HTML into organized collections of
ClassPeriodobjects, including:Parsing table metadata (program, intake, semester)
Extracting faculty information
Building 2D matrix of class routine
Converting raw HTML into structured objects
- CELL_TEXT_REGEX = '^(?P<ccode>[A-Z]{2,} \\d{3,}),FC:,(?P<fcode>[A-Z]+)(?:,B:,(?P<bnum>\\d+)(?: ⇒)?)?,R:,(?P<rnum>[\\d\\w-]+)$'¶
Regular expression pattern to parse structured cell text with named groups.
Sample cell texts:
CE 417,FC:,SRB,R:,4904GED 1105,FC:,MSBN,R:,1504-BCSE 232,FC:,MAFI,B:,2 ⇒,R:,418
- transform_to_models(tables: list[RawClassRoutineTable], program_code: str) list[ClassPeriod][source]¶
Transform multiple HTML routine tables into
ClassPeriodmodels.- Parameters:
tables (list[RawClassRoutineTable]) – List of extracted HTML routine tables to transform
program_code (str) – Program code of the routine (e.g., ‘006’, ‘001’)
- Returns:
List of transformed class periods
- Return type:
list[ClassPeriod]
- transform_to_model(data: RawClassRoutineTable, program_code: str, shift_time: ShiftTime = None) list[ClassPeriod][source]¶
Transform a single HTML routine table into
ClassPeriodmodels.- Parameters:
data (RawClassRoutineTable) – Extracted HTML routine table data
program_code (str) – Program code of the routine (e.g., ‘006’, ‘001’)
shift_time (ShiftTime, optional) – Pre-determined shift time if available, defaults to None
- Raises:
ValueError – the routine data cannot be transformed
- Returns:
List of ClassPeriod extracted from the table
- Return type:
list[ClassPeriod]
Transformation Steps:
Extracts routine metadata (program, semester, intake and section)
Builds 2D matrix from the routine HTML table
Extracts faculty information from the faculty table
Constructs ClassPeriod objects from the generated 2D matrix
- extract_routine_meta(tag: Tag) dict[str, str][source]¶
Extract metadata from routine HTML header table.
Parses program name, intake-section combination, and semester information from a
BeautifulSoup.Tagobject representing a routine table header.- Parameters:
tag (bs4.Tag) –
BeautifulSoup.Tagcontaining the routine header table- Returns:
Dictionary containing extracted metadata with keys:
program: Name of academic program
intake_section: Combined intake and section numbers
semester: Semester name and year
- Return type:
dict[str, str]
HTML Structure Expected:
The method expects a table row (
<tr>) with three<td>elements containing:Left-aligned: Program name (e.g., “Program: B.Sc. Engg. in CSE”)
Center-aligned: Intake-section (e.g., “Intake: 49 - 3”)
Right-aligned: Semester info (e.g., “Semester: Spring, 2023”)
Text Processing Examples:
# Program examples "<td align='left'>Program: B.Sc. Engg. in CSE</td>" > {"program": "B.Sc. Engg. in CSE"} # Intake-section examples "<td align='center'>Intake:60 - 7</td>" > {"intake_section": "60 - 7"} "<td align='center'>Intake:53 - 1\n\t\t\t(Major: HR)</td>" > {"intake_section": "53 - 1 (Major: HR)"} # Semester examples "<td align='right'>Semester:Spring, 2025</td>" > {"semester": "Spring, 2025"}
- extract_faculty_table(tag: Tag) dict[str, TransformedClassFacultyTable][source]¶
Extract faculty information from an HTML table into structured data.
Parses a faculty information table where each row represents a course and its associated faculty members, converting it into a dictionary mapping course codes to faculty data objects. Mapping is done to get the data easily during the period construction.
- Parameters:
tag (bs4.Tag) – BeautifulSoup.Tag containing the faculty table
- Returns:
Dictionary mapping course codes to
TransformedClassFacultyTableobjects containing:course_code: The course identifier (e.g., “CSE 317”)
course_name: Full course name (e.g., “System Analysis & Design”)
faculty_code: Faculty code (e.g., “MDI”)
faculty_name: Full faculty name (e.g., “MD. Masudul Islam”)
- Return type:
dict[str, TransformedClassFacultyTable]
Expected Table Structure:
Each table row (<tr>) should contain exactly 4 cells (<td>) with:
Course code
Course name
Faculty short code
Faculty full name
<tr> <td>CSE 317</td> <td>System Analysis And Design</td> <td>MDI</td> <td>MD. Masudul Islam</td> </tr>
- build_2D_routine_matrix(tag: Tag) list[list[TransformedClassPeriod]][source]¶
Build a structured 2D matrix from the routine HTML table.
- Parameters:
tag (bs4.Tag) – BeautifulSoup.Tag containing the routine table
- Returns:
2D schedule matrix where:
First row contains time slot headers (from <th> elements)
Subsequent rows represent days with class periods (from <td> elements)
Each cell contains
TransformedClassPeriodobject with parsed data
- Return type:
list[list[TransformedClassPeriod]]
- Raises:
ValueError – Provided table tag is invalid
Expected Table Structure:
First row: Time slot headers (<th> elements)
Subsequent rows: Period information (<td> elements)
Example Output Structure:
[ ['Day/Time', '08:00 AM to 09:15 AM', ...] # First row is time slots ['SAT', ...], ['SUN', '', '', TransformedClassPeriod(course_code='CSE 407', faculty_code='WMO', building='2', room='709'), ...], ..., ..., ..., ['FRI', ...] ]
- parse_routine_row(cells: list[Tag]) list[TransformedClassPeriod][source]¶
Parse a row of HTML table cells into routine period metadata.
Converts a list of HTML table cells into a row of routine data, with each cell either containing a
TransformedClassPeriodobject for occupied periods or an empty string to preserve the period index.- Parameters:
cells (list[Tag]) – List of
BeautifulSoup.Tagrepresenting table cells- Returns:
List containing either:
TransformedClassPeriodobjects for valid class periodsEmpty strings for empty cells (empty periods)
Raw text for the cells which doesn’t match the regex
- Return type:
list[TransformedClassPeriod]
Cell Parsing Logic:
Extracts and cleans cell text content
For non-empty cells, attempts pattern matching with
CELL_TEXT_REGEX- On successful match, creates
TransformedClassPeriodwith: Course code
Faculty code
Building number
Room number
- On successful match, creates
Handles legacy building/room number formats
Preserves original text for unparseable cells
Example Input Tags:
[<th><em>Day</em>/<em>Time</em></th>, <th>08:00 AM to 09:15 AM</th>, ... ,<th>05:15 PM to 06:30 PM</th>] [<th>SUN</th>, ... , <td style="background: #fff ;"> <strong>CSE 417</strong><br/><strong>FC:</strong> SWL <br/><strong>R:</strong> 2710 </td>, ... , <td style="background: #FFF ;"> </td>] [<th>THR</th>, .. ,<td style="background: #fff ;"> <strong>CSE 476</strong><br/><strong>FC:</strong> SPS <br/><strong>B:</strong> 2 ⇒ <strong>R:</strong> 419 </td>, ... , <td style="background: #FFF ;"> </td>]
Example Output:
['Day/Time', '08:00 AM to 09:15 AM', ... , '05:15 PM to 06:30 PM'] ['THR', ... , TransformedClassPeriod(course_code='CSE 417', faculty_code='SWL', building='2', room='710'), ... , ''] ['THR', ... , TransformedClassPeriod(course_code='CSE 476', faculty_code='SPS', building='2', room='419'), ... , '']