Your Name and Title:

David Palmquist, Systems Administration

Library, School, or Organization Name:

California State University, Fullerton Pollak Library

Co-Presenter Name(s):

 

Area of the World from Which You Will Present:

Fullerton, CA

Language in Which You Will Present:

English

Target Audience(s):

Library Administration Staff

Short Session Description (one line):

Building a practical book spine recognition system with YOLOv8 detection and Qwen2.5-VL OCR reconciled to Alma data.
 

Full Session Description (as long as you would like):

This session presents the design and implementation of an automated book spine recognition system for library inventory management.The project builds upon the open-source foundation started by Min-Han Li (https://github.com/MinHanLiWesley/book-spine-recognition), which uses YOLOv8 for spine detection, RealESRGAN for image enhancement, Google Cloud Vision for OCR, and Gemini for metadata refinement.
 
In our version, we retain the YOLOv8 detection backbone; however, we made several key architectural decisions differently to better suit our environment and goals:
  • We replaced the cloud-based OCR and refinement pipeline with a single local Qwen2.5-VL vision-language model that directly performs text recognition on spine crops.
  • Instead of public book databases such as Open Library, we reconcile the extracted metadata against our institutional Alma library catalog data without enrichment steps.
The session will cover how we built our custom training dataset, how we trained locally and on the nrp, and all stage of the pipleline at a high level-- from image input and spine detection, VLM-based OCR, metadata extraction, and catalog reconciliation. 

Websites / URLs Associated with Your Session:

https://www.dropbox.com/scl/fi/9h4vp7kvshm0mqoz0aswj/stacks_inventory.pdf?rlkey=sh9hnjejkxw5s21w32rijyegl&st=uhdfrrsf&dl=0
Dataset and Public repo will be available by presentation date.

You need to be a member of Library 2.0 to add comments!

Join Library 2.0

Votes: 0
Email me when people reply –