Comic Panel Extractor: Open-source panel segmentation and mobile-ready CBZ packaging for digital comics research

Author
Affiliations

M. Ehsan Karim

1

Published

March 1, 2026

Summary

Comic Panel Extractor is an open-source tool that automatically segments comic pages into individual panels and repackages them into a mobile-optimized .cbz for panel-by-panel reading. It accepts .cbz and .pdf inputs, renders PDFs at 300 DPI, and performs robust panel detection using a classical computer-vision pipeline in Python. The project ships a desktop app, a local web backend, and version-pinned dependencies to support reproducible workflows. The detector has been hardened with: Otsu thresholding with adaptive fallback, morphological closing with a size-aware kernel, RETR_EXTERNAL contouring to ignore internal artifacts, IoU-based non‑maximum suppression (NMS), and a DPI‑independent reading‑order sorter with RTL support. These choices make the tool effective on both clean and noisy scans across varied layout styles.

Panel segmentation is a foundational step for comics research and downstream pipelines (OCR, balloon/text extraction, character analysis, multimodal narrative modeling), while also improving small‑screen reading. This software focuses on local, deterministic processing that researchers can run without GPUs or pre‑trained models, complementing learning‑based approaches and facilitating reproducible digital scholarship.

Statement of need

Comics—spanning Western color books, Franco‑Belgian bandes dessinées, and manga—are rich, multimodal cultural artifacts. Digitized collections are growing quickly, but pages arrive as flat images whose higher‑level structure (panels, reading order) must be reconstructed before meaningful computational analysis can proceed. Current literature underscores the centrality and difficulty of robust panel segmentation across diverse page styles and scan conditions. A lightweight, locally runnable, and dependency‑pinned tool helps scholars convert unstructured pages into panel‑level assets for analysis, annotation, and dataset creation.

Comic Panel Extractor fills this gap with a transparent, classical CV pipeline (OpenCV + NumPy) and practical packaging back to panel‑wise .cbz. It specifically targets researchers and instructors who need: (1) a fast way to segment panels without installing deep‑learning frameworks; (2) consistent outputs for batch processing; and (3) predictable performance on historical scans (foxing, bleed, narrow gutters) as well as modern layouts.

State of the field

Early panel segmentation methods leveraged edge/line detection and polygon reconstruction to recover rectangular or quadrilateral frames on the page, which worked well for many grid‑like layouts. Later work developed manga‑specific procedures (e.g., closing open frames, adaptive page partitioning) to cope with irregular borders and stylistic effects. More recently, learning‑based approaches (instance segmentation, detectors, or U‑Net‑style models) and sequence understanding benchmarks have advanced performance and evaluation breadth, though they typically require GPUs and curated training data, and they can generalize unevenly across cultures and eras.

Positioning. Comic Panel Extractor does not aim to supplant learned models; instead it contributes a robust classical baseline that users can run out‑of‑the‑box across heterogeneous corpora, and it produces artifacts (panel crops and panel‑wise CBZs) that plug cleanly into downstream research steps. This pragmatic stance—classical CV for structure recovery and packaging—complements the field’s learning‑centric trajectory and supports reproducible pre‑processing at scale.

Software design and functionality

Implementation. The tool is written in Python. PDFs are rasterized with PyMuPDF, images are processed with OpenCV, and I/O relies on standard libraries. The repository includes a desktop UI (Tkinter + ttk) with thread‑safe progress updates (background worker → queue.Queue()root.after(...)), so the interface remains responsive during long conversions. CBZ packaging is done with zipfile, preserving images with a high-quality (90%) JPEG compression.

Detector pipeline (robust defaults).

  1. Grayscale + binarization. The detector applies Otsu thresholding and measures the white‑pixel ratio; if the page looks under‑/over‑segmented (e.g., aged paper, uneven illumination), it falls back to adaptive Gaussian thresholding.
  2. Morphology. The binary mask is inverted (so borders are bright), then closed with a rectangular kernel sized as a small fraction of the page’s shorter dimension, which bridges broken ink borders without over‑thickening strokes.
  3. Contours (outer borders only). The algorithm uses cv2.RETR_EXTERNAL to ignore internal sub‑contours (e.g., speech balloons) and focuses on candidate panel regions.
  4. Screening & shape checks. Candidates outside a configurable relative area range or with extreme aspect ratios are discarded, which removes page‑sized backgrounds and thin slivers.
  5. IoU‑based NMS. Overlapping boxes are deduplicated via Intersection‑over‑Union Non‑Maximum Suppression, which is symmetric and DPI‑agnostic.
  6. Padding & clipping. Each surviving box is padded by a small percentage of page size and clipped to image bounds to avoid off‑edge crops.
  7. Reading order. Panels are sorted into rows by clustering on center‑y with tolerance tied to the median panel height (robust to DPI and page size). Rows are ordered top‑to‑bottom, and items within a row are ordered left‑to‑right (with programmatic support for right‑to‑left reading available in the underlying engine).

Packaging & I/O. The engine recursively enumerates page images—including nested paths inside CBZs—applies detection, writes a full‑page image plus all panel crops, and finally creates a panel‑wise .cbz bundle for small‑screen reading or downstream analysis. For reviewers, the version‑pinned environment and zero‑GPU requirements simplify reproducibility.

Quality assurance

  • Deterministic local execution. Processing runs offline with explicit version pins, yielding stable results across systems and enabling easy peer replication.
  • Thread‑safe user interface. The desktop app dispatches heavy work to a daemon thread, with all UI updates executed on the main thread via a message queue and after polling, avoiding Tkinter race conditions.
  • Practical validation. The detector’s binarization fallback, scale‑aware morphology, RETR_EXTERNAL setting, and IoU‑NMS are designed to handle common failure modes: weak gutters, broken borders, internal balloons being mistaken as panels, and near‑duplicate boxes.
  • Extensibility. The detection function accepts an RTL flag and exposes thresholds that can be tuned in notebooks or scripts; the UI uses reasonable defaults.

Examples and use cases

  1. Digital humanities & comics studies. Segment pages into panels to analyze layout conventions (panel density, gutter widths) and reading order across epochs and cultures; create panel‑level corpora for annotation and teaching.
  2. Computer vision preprocessing. Generate panel crops as inputs for balloon/text detection, OCR, character recognition, or multimodal models trained at panel scale rather than full pages.
  3. Mobile reading conversion. Convert legacy .pdf scans into panel‑wise CBZs for phones or tablets, without cloud services or GPU‑heavy dependencies.
  4. Dataset curation. Bootstrap panel‑level datasets from institutional scans or public‑domain repositories, complementing curated benchmarks with new material.

Research impact statement

By emphasizing accessibility, determinism, and packaging, Comic Panel Extractor helps the community standardize pre‑processing and reduce friction when moving from scanned pages to panel‑level analyses. Its robustness to scan artifacts and layout variability broadens applicability beyond narrow benchmarks and supports reproducible digital scholarship at scale, including teaching, archival enrichment, and the creation of new evaluation sets.

Acknowledgements

We acknowledge the open‑source ecosystem—especially OpenCV and the broader Python scientific stack—and the research community’s groundwork on panel segmentation, manga‑specific methods, and recent learning‑based advances.

AI usage disclosure

Technical descriptions were verified against the project’s source code and documentation with the assistance of a large language model. The software itself was conceptualized by the author, and was implemented with the assistance of several large language models.

References

  • Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools.
  • van der Walt, S., Schönberger, J. L., Nunez‑Iglesias, J., Boulogne, F., Warner, J. D., Yager, N., Gouillart, E., Yu, T., & the scikit-image contributors. (2014). scikit‑image: image processing in Python. PeerJ 2:e453. https://doi.org/10.7717/peerj.453
  • Li, L., Wang, Y., Tang, Z., & Gao, L. (2012). Automatic comic page segmentation based on polygon detection. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-012-1241-7
  • Pang, X., Cao, Y., Lau, R. W. H., & Chan, A. B. (2014). A Robust Panel Extraction Method for Manga. ACM Multimedia. https://doi.org/10.1145/2647868.2654990
  • Xie, M., Lin, J., Liu, H., Li, C., & Wong, T.-T. (2025). Advancing Manga Analysis: Comprehensive Segmentation Annotations for the Manga109 Dataset. CVPR 2025.
  • Rishu, & Kukreja, V. (2024/2025). Decoding comics: a systematic literature review on recognition, segmentation, and classification techniques. Multimedia Tools and Applications.
  • Vivoli, E., Llabrés, A., Souibgui, M. A., Bertini, M., Valveny, E. L., & Karatzas, D. (2025). ComicsPAP: understanding comic strips by picking the correct panel. arXiv:2503.08561 (ICDAR 2025).