convert.py - OER-Forge Content Converter
Content Conversion Utilities for OERForge
Overview
oerforge.convert
provides functions for converting Jupyter notebooks (.ipynb
) and Markdown files to various formats, managing associated images, and updating a SQLite database with conversion status. It supports batch and single-file conversion, image extraction and copying, and database logging.
Functions
setup_logging
def setup_logging()
Configure logging for conversion actions. Logs to log/export.log
and the console.
query_images_for_content
def query_images_for_content(content_record, conn)
Query the database for all images associated with a content file.
Parameters
content_record
(dict): Content record dictionary.conn
: SQLite connection object.
Returns
list[dict]
: List of image records.
copy_images_to_build
def copy_images_to_build(images, images_root=IMAGES_ROOT, conn=None)
Copy images to the build images directory. Returns a list of new build paths.
Parameters
images
(list[dict]): List of image records.images_root
(str): Destination directory for images.conn
: SQLite connection object (optional).
Returns
list[str]
: List of copied image paths.
update_markdown_image_links
def update_markdown_image_links(md_path, images, images_root=IMAGES_ROOT)
Update image links in a Markdown file to point to copied images in the build directory.
Parameters
md_path
(str): Path to the Markdown file.images
(list[dict]): List of image records.images_root
(str): Images directory.
handle_images_for_markdown
def handle_images_for_markdown(content_record, conn)
Orchestrate image handling for a Markdown file: query, copy, and update links.
Parameters
content_record
(dict): Content record dictionary.conn
: SQLite connection object.
convert_md_to_docx
def convert_md_to_docx(src_path, out_path, record_id=None, conn=None)
Convert a Markdown file to DOCX using Pandoc. Updates DB conversion status if record_id
and conn
are provided.
Parameters
src_path
(str): Source Markdown file path.out_path
(str): Output DOCX file path.record_id
(int, optional): Content record ID.conn
: SQLite connection object (optional).
convert_md_to_pdf
def convert_md_to_pdf(src_path, out_path, record_id=None, conn=None)
Convert a Markdown file to PDF using Pandoc. Updates DB conversion status if record_id
and conn
are provided.
Parameters
src_path
(str): Source Markdown file path.out_path
(str): Output PDF file path.record_id
(int, optional): Content record ID.conn
: SQLite connection object (optional).
convert_md_to_tex
def convert_md_to_tex(src_path, out_path, record_id=None, conn=None)
Convert a Markdown file to LaTeX using Pandoc. Updates DB conversion status if record_id
and conn
are provided.
Parameters
src_path
(str): Source Markdown file path.out_path
(str): Output LaTeX file path.record_id
(int, optional): Content record ID.conn
: SQLite connection object (optional).
convert_md_to_txt
def convert_md_to_txt(src_path, out_path, record_id=None, conn=None)
Convert a Markdown file to plain TXT (extracts readable text). Updates DB conversion status if record_id
and conn
are provided.
Parameters
src_path
(str): Source Markdown file path.out_path
(str): Output TXT file path.record_id
(int, optional): Content record ID.conn
: SQLite connection object (optional).
batch_convert_all_content
def batch_convert_all_content(config_path=None)
Batch process all files in the content table. For each file, checks conversion flags and calls appropriate conversion functions. Organizes output to mirror TOC hierarchy.
Parameters
config_path
(str, optional): Path to_content.yml
config file.
CLI Usage
python convert.py batch
python convert.py single --src <source> --out <output> --fmt <format> [--record_id <id>]
Requirements
- Python 3.7+
- Pandoc (for docx, pdf, tex conversions)
- nbconvert
- markdown-it-py
- SQLite3
See Also
License
See LICENSE
in the project root.