Class Routine Transformer

class routinepy.lib.scraper.transformers.class_routine.ClassRoutineTableTransformer[source]

Transforms raw HTML class routine tables into structured routine data.

This class handles the conversion of class routine HTML into organized collections of ClassPeriod objects, including:

  1. Parsing table metadata (program, intake, semester)

  2. Extracting faculty information

  3. Building 2D matrix of class routine

  4. Converting raw HTML into structured objects

CELL_TEXT_REGEX = '^(?P<ccode>[A-Z]{2,} \\d{3,}),FC:,(?P<fcode>[A-Z]+)(?:,B:,(?P<bnum>\\d+)(?: ⇒)?)?,R:,(?P<rnum>[\\d\\w-]+)$'

Regular expression pattern to parse structured cell text with named groups.

Sample cell texts:

  • CE 417,FC:,SRB,R:,4904

  • GED 1105,FC:,MSBN,R:,1504-B

  • CSE 232,FC:,MAFI,B:,2 ⇒,R:,418

transform_to_models(tables: list[RawClassRoutineTable], program_code: str) list[ClassPeriod][source]

Transform multiple HTML routine tables into ClassPeriod models.

Parameters:
  • tables (list[RawClassRoutineTable]) – List of extracted HTML routine tables to transform

  • program_code (str) – Program code of the routine (e.g., ‘006’, ‘001’)

Returns:

List of transformed class periods

Return type:

list[ClassPeriod]

transform_to_model(data: RawClassRoutineTable, program_code: str, shift_time: ShiftTime = None) list[ClassPeriod][source]

Transform a single HTML routine table into ClassPeriod models.

Parameters:
  • data (RawClassRoutineTable) – Extracted HTML routine table data

  • program_code (str) – Program code of the routine (e.g., ‘006’, ‘001’)

  • shift_time (ShiftTime, optional) – Pre-determined shift time if available, defaults to None

Raises:

ValueError – the routine data cannot be transformed

Returns:

List of ClassPeriod extracted from the table

Return type:

list[ClassPeriod]

Transformation Steps:

  1. Extracts routine metadata (program, semester, intake and section)

  2. Builds 2D matrix from the routine HTML table

  3. Extracts faculty information from the faculty table

  4. Constructs ClassPeriod objects from the generated 2D matrix

extract_routine_meta(tag: Tag) dict[str, str][source]

Extract metadata from routine HTML header table.

Parses program name, intake-section combination, and semester information from a BeautifulSoup.Tag object representing a routine table header.

Parameters:

tag (bs4.Tag) – BeautifulSoup.Tag containing the routine header table

Returns:

Dictionary containing extracted metadata with keys:

  • program: Name of academic program

  • intake_section: Combined intake and section numbers

  • semester: Semester name and year

Return type:

dict[str, str]

HTML Structure Expected:

The method expects a table row (<tr>) with three <td> elements containing:

  1. Left-aligned: Program name (e.g., “Program: B.Sc. Engg. in CSE”)

  2. Center-aligned: Intake-section (e.g., “Intake: 49 - 3”)

  3. Right-aligned: Semester info (e.g., “Semester: Spring, 2023”)

Text Processing Examples:

# Program examples
"<td align='left'>Program: B.Sc. Engg. in CSE</td>"
> {"program": "B.Sc. Engg. in CSE"}

# Intake-section examples
"<td align='center'>Intake:60  -  7</td>"
> {"intake_section": "60 - 7"}

"<td align='center'>Intake:53  -  1\n\t\t\t(Major: HR)</td>"
> {"intake_section": "53 - 1 (Major: HR)"}

# Semester examples
"<td align='right'>Semester:Spring, 2025</td>"
> {"semester": "Spring, 2025"}
extract_faculty_table(tag: Tag) dict[str, TransformedClassFacultyTable][source]

Extract faculty information from an HTML table into structured data.

Parses a faculty information table where each row represents a course and its associated faculty members, converting it into a dictionary mapping course codes to faculty data objects. Mapping is done to get the data easily during the period construction.

Parameters:

tag (bs4.Tag) – BeautifulSoup.Tag containing the faculty table

Returns:

Dictionary mapping course codes to TransformedClassFacultyTable objects containing:

  • course_code: The course identifier (e.g., “CSE 317”)

  • course_name: Full course name (e.g., “System Analysis & Design”)

  • faculty_code: Faculty code (e.g., “MDI”)

  • faculty_name: Full faculty name (e.g., “MD. Masudul Islam”)

Return type:

dict[str, TransformedClassFacultyTable]

Expected Table Structure:

Each table row (<tr>) should contain exactly 4 cells (<td>) with:

  1. Course code

  2. Course name

  3. Faculty short code

  4. Faculty full name

<tr>
    <td>CSE 317</td>
    <td>System Analysis And Design</td>
    <td>MDI</td>
    <td>MD. Masudul Islam</td>
</tr>
build_2D_routine_matrix(tag: Tag) list[list[TransformedClassPeriod]][source]

Build a structured 2D matrix from the routine HTML table.

Parameters:

tag (bs4.Tag) – BeautifulSoup.Tag containing the routine table

Returns:

2D schedule matrix where:

  • First row contains time slot headers (from <th> elements)

  • Subsequent rows represent days with class periods (from <td> elements)

  • Each cell contains TransformedClassPeriod object with parsed data

Return type:

list[list[TransformedClassPeriod]]

Raises:

ValueError – Provided table tag is invalid

Expected Table Structure:

  • First row: Time slot headers (<th> elements)

  • Subsequent rows: Period information (<td> elements)

Example Output Structure:

[
    ['Day/Time', '08:00 AM to 09:15 AM', ...]  # First row is time slots
    ['SAT', ...],
    ['SUN', '', '', TransformedClassPeriod(course_code='CSE 407', faculty_code='WMO', building='2', room='709'), ...],
    ...,
    ...,
    ...,
    ['FRI', ...]
]
parse_routine_row(cells: list[Tag]) list[TransformedClassPeriod][source]

Parse a row of HTML table cells into routine period metadata.

Converts a list of HTML table cells into a row of routine data, with each cell either containing a TransformedClassPeriod object for occupied periods or an empty string to preserve the period index.

Parameters:

cells (list[Tag]) – List of BeautifulSoup.Tag representing table cells

Returns:

List containing either:

  • TransformedClassPeriod objects for valid class periods

  • Empty strings for empty cells (empty periods)

  • Raw text for the cells which doesn’t match the regex

Return type:

list[TransformedClassPeriod]

Cell Parsing Logic:

  1. Extracts and cleans cell text content

  2. For non-empty cells, attempts pattern matching with CELL_TEXT_REGEX

  3. On successful match, creates TransformedClassPeriod with:
    • Course code

    • Faculty code

    • Building number

    • Room number

  4. Handles legacy building/room number formats

  5. Preserves original text for unparseable cells

Example Input Tags:

[<th><em>Day</em>/<em>Time</em></th>, <th>08:00 AM to 09:15 AM</th>, ... ,<th>05:15 PM to 06:30 PM</th>]
[<th>SUN</th>, ... , <td style="background: #fff ;"> <strong>CSE 417</strong><br/><strong>FC:</strong> SWL   <br/><strong>R:</strong> 2710     </td>, ... , <td style="background:  #FFF ;"> </td>]
[<th>THR</th>, .. ,<td style="background: #fff ;"> <strong>CSE 476</strong><br/><strong>FC:</strong> SPS   <br/><strong>B:</strong> 2 ⇒ <strong>R:</strong> 419     </td>, ... , <td style="background:  #FFF ;"> </td>]

Example Output:

['Day/Time', '08:00 AM to 09:15 AM', ... , '05:15 PM to 06:30 PM']
['THR', ... , TransformedClassPeriod(course_code='CSE 417', faculty_code='SWL', building='2', room='710'), ... , '']
['THR', ... , TransformedClassPeriod(course_code='CSE 476', faculty_code='SPS', building='2', room='419'), ... , '']