HTML Parsers¶

For parsing routine data from HTML webpages.

Routine Webpage Parser¶

class routinepy.lib.scraper.parsers.html.routine_page.RoutinePageParser(html: str)[source]¶

A parser for extracting routine information from university web pages.

This class contains CSS selectors used to locate and extract different types of routines (regular class, exams, supplementary exams) from the university’s webpage

CLASS_ACCORDION_SELECTOR = 'div#accordionClassRoutine'¶: CSS Selector of the container specifically for class routine table

CLASS_LINK_SELECTOR = 'a.btn.btn-primary'¶: CSS selector for class routine links Targets primary button-style link for getting View link

EXAM_ACCORDION_SELECTOR = 'div#Exam_Routine'¶: CSS Selector of the container specifically for exam routine table

EXAM_NAME_CELL_SELECTOR = 'td:nth-child(1)'¶: CSS selector for exam name cells in tables Targets the first column of exam routine tables for name

ROUTINE_CONTAINER_SELECTOR = 'div#accordion.panel-group'¶: CSS Selector of the main container element holding all routine sections

SUP_EXAM_ACCORDION_SELECTOR = 'div#Sup_Exam_Routine'¶: CSS Selector of the container specifically for supplementary exam routine table

__init__(html: str)[source]¶

Initialize the RoutinePageParser with HTML content to be parsed.

Parameters:: html (str) – Raw HTML string containing the routine page content.
Raises:: ValueError – If the provided HTML is empty

get_class_routine_links()[source]¶

Extract class routine links of all programs

Parses the HTML table identified by CLASS_ACCORDION_SELECTOR to retrieve a title and a list of links of class routine.

Returns:

A dictionary containing:

title (str) - The title of the class routine section.
links (list[dict]) - A list of dictionaries, each with:
- program_code (str) - The academic program code.
- semester_code (str) - The semester code.
- link (str) - The URL to the class routine.

Return type:

dict[str, list[dict[str,str]]]

Example Output:

{
    "title": "Class Routine of Spring 2025",
    "links": [
        {
            "program_code": "001",
            "semester_code": "610,064",
            "link": "https://annex.bubt.edu.bd/global_file/routine.php?p=001&semNo=610,064",
        },
        {
            "program_code": "006",
            "semester_code": "610,064",
            "link": "https://annex.bubt.edu.bd/global_file/routine.php?p=006&semNo=610,064",
        },
    ],
}

get_exam_routine_links()[source]¶

Extract exam routine links of all programs

Parses the HTML table identified by EXAM_ACCORDION_SELECTOR to retrieve a title and a list of links to exam routine.

Raises:

ValueError – Exam routine container is not found in the HTML
RuntimeError – Parsing fails due to invalid table structure or data

Returns:

A dictionary containing:

title (str) - The title of the exam routine section.
links (list[dict]) - A list of dictionaries, each mapping an exam name
(str) to a list of dictionaries with link names and normalized URLs.

Return type:

dict[str, list[dict]]

Example Output:

{
    "title": "Final Exam Schedule, Spring 2025",
    "links": [
        {
            "Day Program": [
                {
                    "CSE Program": "https://www.bubt.edu.bd/assets/frontend/uploads/CSE_MODIFY_12_12_12.pdf"
                },
                {
                    "Other Programs": "https://www.bubt.edu.bd/assets/frontend/uploads/Semester_Final_Examination_Schedule_Fall_2024_-for_students-3rd_Revised_docx.pdf"
                },
            ]
        },
        {
            "Evening Program": [
                {
                    "Download": "https://www.bubt.edu.bd/assets/frontend/uploads/updated_Final_Examination_Schedule,_Spring_2025,_Evening_Program.pdf"
                }
            ]
        },
    ],
}

get_sup_exam_routine_links()[source]¶

Extract supplementary exam routine links for all programs.

Parses the HTML table identified by SUP_EXAM_ACCORDION_SELECTOR to retrieve a title and a list of links to supplementary exam routine.

Raises:

ValueError – Exam routine container is not found in the HTML
RuntimeError – Parsing fails due to invalid table structure or data

Returns:

A dictionary containing:

title (str) - The title of the supplementary exam routine section.
links (list[dict]) - A list of dictionaries, each mapping an exam name
(str) to a list of dictionaries with link names and normalized URLs.

Return type:

dict[str, list[dict]]

Example Output:

{
    "title": "Supplementary Mid-Term Exam Schedule, Spring 2025",
    "links": [
        {
            "Day Program": [
                {
                    "Click here": "https://www.bubt.edu.bd/assets/frontend/uploads/Day_Program_Supplementry_Mid-Terml_Routine_Spring,_2025.pdf"
                }
            ]
        },
        {
            "Evening Program": [
                {
                    "Click here": "https://www.bubt.edu.bd/assets/frontend/uploads/Supplementary_Spring,_2025_Mid-Term_Examination_Evening_(NB)_.pdf"
                }
            ]
        },
    ],
}

Class Routine Parser¶

class routinepy.lib.scraper.parsers.html.class_routine.ClassRoutineParser(html: str)[source]¶

Parser for extracting class routine information from university HTML pages.

This parser extracts three main components from the HTML structure:

Metadata table (containing routine header information)

Main routine table (containing schedule data)

Faculty table (containing instructor information)

TABLE_META_SELECTOR = 'table#HdtableRtn'¶: CSS selector for the metadata table (contains semester/program info)

extract_routine_tables() → list[RawClassRoutineTable][source]¶

Extract and package all routine table parts from the HTML.

This method locates and extracts three related HTML tables for each class routine:

Metadata table: Contains semester, program and section information.

Main routine table: Contains the class schedule.

Faculty table: Contains teacher and course details.

Returns:: A list of RawClassRoutineTable objects, each containing the metadata, routine, and faculty tables for a single class routine.
Return type:: list[RawClassRoutineTable]