HTML Parsers¶
For parsing routine data from HTML webpages.
Routine Webpage Parser¶
- class routinepy.lib.scraper.parsers.html.routine_page.RoutinePageParser(html: str)[source]¶
A parser for extracting routine information from university web pages.
This class contains CSS selectors used to locate and extract different types of routines (regular class, exams, supplementary exams) from the university’s webpage
- CLASS_ACCORDION_SELECTOR = 'div#accordionClassRoutine'¶
CSS Selector of the container specifically for class routine table
- CLASS_LINK_SELECTOR = 'a.btn.btn-primary'¶
CSS selector for class routine links Targets primary button-style link for getting View link
- EXAM_ACCORDION_SELECTOR = 'div#Exam_Routine'¶
CSS Selector of the container specifically for exam routine table
- EXAM_NAME_CELL_SELECTOR = 'td:nth-child(1)'¶
CSS selector for exam name cells in tables Targets the first column of exam routine tables for name
- ROUTINE_CONTAINER_SELECTOR = 'div#accordion.panel-group'¶
CSS Selector of the main container element holding all routine sections
- SUP_EXAM_ACCORDION_SELECTOR = 'div#Sup_Exam_Routine'¶
CSS Selector of the container specifically for supplementary exam routine table
- __init__(html: str)[source]¶
Initialize the
RoutinePageParserwith HTML content to be parsed.- Parameters:
html (str) – Raw HTML string containing the routine page content.
- Raises:
ValueError – If the provided HTML is empty
- get_class_routine_links()[source]¶
Extract class routine links of all programs
Parses the HTML table identified by
CLASS_ACCORDION_SELECTORto retrieve a title and a list of links of class routine.- Returns:
- A dictionary containing:
title (str) - The title of the class routine section.
- links (list[dict]) - A list of dictionaries, each with:
program_code (str) - The academic program code.
semester_code (str) - The semester code.
link (str) - The URL to the class routine.
- Return type:
dict[str, list[dict[str,str]]]
Example Output:
{ "title": "Class Routine of Spring 2025", "links": [ { "program_code": "001", "semester_code": "610,064", "link": "https://annex.bubt.edu.bd/global_file/routine.php?p=001&semNo=610,064", }, { "program_code": "006", "semester_code": "610,064", "link": "https://annex.bubt.edu.bd/global_file/routine.php?p=006&semNo=610,064", }, ], }
- get_exam_routine_links()[source]¶
Extract exam routine links of all programs
Parses the HTML table identified by
EXAM_ACCORDION_SELECTORto retrieve a title and a list of links to exam routine.- Raises:
ValueError – Exam routine container is not found in the HTML
RuntimeError – Parsing fails due to invalid table structure or data
- Returns:
- A dictionary containing:
title (str) - The title of the exam routine section.
- links (list[dict]) - A list of dictionaries, each mapping an exam name
(str) to a list of dictionaries with link names and normalized URLs.
- Return type:
dict[str, list[dict]]
Example Output:
{ "title": "Final Exam Schedule, Spring 2025", "links": [ { "Day Program": [ { "CSE Program": "https://www.bubt.edu.bd/assets/frontend/uploads/CSE_MODIFY_12_12_12.pdf" }, { "Other Programs": "https://www.bubt.edu.bd/assets/frontend/uploads/Semester_Final_Examination_Schedule_Fall_2024_-for_students-3rd_Revised_docx.pdf" }, ] }, { "Evening Program": [ { "Download": "https://www.bubt.edu.bd/assets/frontend/uploads/updated_Final_Examination_Schedule,_Spring_2025,_Evening_Program.pdf" } ] }, ], }
- get_sup_exam_routine_links()[source]¶
Extract supplementary exam routine links for all programs.
Parses the HTML table identified by
SUP_EXAM_ACCORDION_SELECTORto retrieve a title and a list of links to supplementary exam routine.- Raises:
ValueError – Exam routine container is not found in the HTML
RuntimeError – Parsing fails due to invalid table structure or data
- Returns:
- A dictionary containing:
title (str) - The title of the supplementary exam routine section.
- links (list[dict]) - A list of dictionaries, each mapping an exam name
(str) to a list of dictionaries with link names and normalized URLs.
- Return type:
dict[str, list[dict]]
Example Output:
{ "title": "Supplementary Mid-Term Exam Schedule, Spring 2025", "links": [ { "Day Program": [ { "Click here": "https://www.bubt.edu.bd/assets/frontend/uploads/Day_Program_Supplementry_Mid-Terml_Routine_Spring,_2025.pdf" } ] }, { "Evening Program": [ { "Click here": "https://www.bubt.edu.bd/assets/frontend/uploads/Supplementary_Spring,_2025_Mid-Term_Examination_Evening_(NB)_.pdf" } ] }, ], }
Class Routine Parser¶
- class routinepy.lib.scraper.parsers.html.class_routine.ClassRoutineParser(html: str)[source]¶
Parser for extracting class routine information from university HTML pages.
This parser extracts three main components from the HTML structure:
Metadata table (containing routine header information)
Main routine table (containing schedule data)
Faculty table (containing instructor information)
- TABLE_META_SELECTOR = 'table#HdtableRtn'¶
CSS selector for the metadata table (contains semester/program info)
- extract_routine_tables() list[RawClassRoutineTable][source]¶
Extract and package all routine table parts from the HTML.
This method locates and extracts three related HTML tables for each class routine:
Metadata table: Contains semester, program and section information.
Main routine table: Contains the class schedule.
Faculty table: Contains teacher and course details.
- Returns:
A list of
RawClassRoutineTableobjects, each containing the metadata, routine, and faculty tables for a single class routine.- Return type:
list[RawClassRoutineTable]