Skin-Path: A Vision-Language Dataset for Skin Cancer Pathology

A high-quality vision-language dataset for advancing AI-driven skin pathology research.

Abstract

Existing pathology VL datasets (e.g., PathCap, OpenPath, QUILT) suffer from low image quality, poor text-image alignment, and limited scalability, especially in skin cancer pathology. To bridge this gap, we introduce Skin-Path, a high-quality VL dataset curated from 194 H&E-stained WSIs at 20× magnification, with 277,761 image patches (300×300 px) and expert-annotated captions. Covering 10 skin diseases (e.g., seborrhoeic keratosis, basal cell carcinoma, squamous cell carcinoma), Skin-Path enables VL model training, medical report generation, and disease classification.

Research Objectives

Figure 1(a)
Figure 1(a): Examples of WSI with associated captions and patches extracted at ×20 magnification.


Figure 1(b)
Figure 1(b): Example of basal cell carcinoma (BCC) slide with associated captions and patches.


The primary goal of the Skin-Path dataset is to provide a high-quality vision-language dataset for advancing AI-driven pathology research, particularly in skin cancer diagnosis and histopathological analysis. Specific objectives include:

Dataset Details

Dataset Access

Download: The dataset demo can be downloaded at Google Drive Link . The full version will be available soon.

Extract into:


Skin-Path/
├── Images/     # Image patches
├── Captions/   # Corresponding diagnostic reports
        

Word Cloud Representation

Figure 2
Figure 2: Word cloud of the Skin-Path dataset.

Terms of Use