Auditing a Scientific Field through its Representations
A global atlas of digital dermatology
Under review · 2025
1University of Basel · 2HSLU · 3Northwestern University
Once we know when to trust representation geometry, the same geometry can audit not just a single dataset but an entire scientific archive. We aggregate 1,135,080 dermatology images from 29 public datasets into one shared representation space and quantify the field's structure from that geometry: how datasets overlap, where new ones genuinely add information, and which clinical phenotypes are missing entirely. The result is the first quantitative atlas of dermatology, showing systematic demographic skew, diminishing returns on new releases, and structural voids in the field's clinical coverage.
The dermatology archive at field scale
Dermatology AI's clinical reliability depends on the quality, diversity, and scale of the data it is trained on. For years the community has presumed the archive is biased: that pediatric patients are underrepresented, that the Global North dominates, that dark skin tones are missing. These have been intuitions and partial single-dataset audits, not a quantified field-level picture. SkinMap measures the bias the community has long presumed.
Without a field-level picture, the community operates without key performance indicators for data acquisition. New datasets get collected and released without a way to tell whether they expand the clinical map or replicate what is already known. Models trained on the resulting corpus inherit any blind spots silently, and external validations on nominally distinct datasets can report inflated generalisation when the datasets share representation structure.
Representation geometry gives us the lens. A learned representation organises samples by similarity, so distances between dataset embeddings, neighbourhood structure, and topological properties become directly meaningful for questions about coverage and redundancy. We build a single embedding space across 29 public datasets and use that space to measure what the field actually contains.
Building the atlas
The atlas combines two complementary representations, both trained from scratch on the union of the 29 datasets. Self-supervised image encoders learn a space that respects pure visual similarity, with no labels and no off-the-shelf foundation-model weights involved. In parallel, an image-text contrastive model is trained on templated captions assembled from the available metadata, which anchors clinical terminology into the visual manifold. Both streams are then projected into a shared low-dimensional space via a learned projector and combined into an ensemble. We train from the data itself rather than fine-tuning a general-purpose foundation model. The resulting space is then shaped by the dermatology archive rather than by the inductive biases of an upstream pretraining set, so the audit measures the data and not an outside prior.
Harmonising metadata across the 29 datasets, let alone adding new attributes, had not been attempted at this scale. Each archive uses its own schema, and many lack key fields entirely. One of the long-standing complaints about public dermatology data is precisely that the field could not even quantify its own demographic composition. On top of the shared manifold, a set of linear probes is trained on the partially annotated subset to predict missing demographic attributes (Fitzpatrick skin type, age, sex, geographic origin). The probes turn the sparsely annotated archive into a fully attributed one: imputed Fitzpatrick coverage expands by +97.1 pp, geography by +54.3 pp. We validate the imputation engine in two ways. On strictly held-out datasets (DDI, PAD-UFES-20), the SkinMap ensemble outperforms state-of-the-art foundation models (MONET, PanDerm) on attribute prediction. On a panel study, five practising dermatologists annotated the same 150 cases independently, and they agreed with the model's predictions more often than they agreed with each other. With this clinically validated engine in place, the field-level audit is no longer constrained to the labelled fraction.
What the audit shows
With one embedding space across the entire archive, three geometric questions become measurable: who is represented (demographic coverage), how much each new dataset actually adds (novelty), and what is structurally missing (topological voids).
Demographic biases
Pooled across all 29 datasets and using the imputation engine for missing labels, the archive remains markedly skewed. Fitzpatrick V–VI account for 11.0% of images, pediatric patients (≤ 18 years) 2.3%, and geographic concentration sits in the Global North.
Diminishing returns on new datasets
Dataset size has grown exponentially over the past decade. Novelty has not. We define yearly novelty as the share of each new dataset that lies in a region of the shared latent space not already covered by the existing archive. Across the 29-dataset chronology, novelty plateaus around the early 2020s and remains flat, even as image counts continue to rise.
To quantify pairwise overlap directly, we compute Fréchet distances between the latent distributions of every dataset pair. Several nominally distinct datasets sit at small Fréchet distance from each other in our embedding space, which means that external validations conducted across these pairs are not genuinely out-of-distribution.
Pairwise Fréchet distance between the 29 datasets in the shared embedding space. Smaller distances (lighter cells) mean two datasets sit near each other in latent space and therefore do not constitute genuinely independent test sets for each other. Hover any cell for the exact pair and the distance.
Several nominally distinct datasets share small Fréchet distance. External validation between such pairs reports similarity rather than out-of-distribution generalisation.
Structural voids
Some gaps are not just about who or what is missing in absolute counts. They are topological: regions of the embedding space where no clinical phenotype lives, surrounded by populated regions. We apply spectral persistent homology to detect these voids directly in the latent geometry, without relying on any metadata.
The most prominent void in the dermatology atlas is nail pathology. Of about 7,800 nail-related images, the bulk concentrates on unspecified nail disease, psoriasis, and onychomycosis. Subungual melanoma, yellow nail syndrome, onychogryphosis, trachyonychia, glomus tumors, koilonychia, habit-tic deformity, myxoid cysts, and Beau's lines are absent or severely underrepresented (under 200 images each). We do not observe a visual phenotype for these conditions in the global archive, so the scarcity is not explained by missing labels.
Using the atlas
Beyond the diagnostic audit, the same shared embedding space drives three immediate applications.
Clinician-facing semantic search. Uploaded cases are projected onto the manifold in real time, and the system returns clinically similar cases drawn from the entire 1.1 million-image archive. Retrieval is no longer constrained to a single dataset.
Structural overlap audit. Pairwise Fréchet distances quantify which datasets are genuinely independent. External validations using nominally distinct but representationally similar datasets can be flagged before they inflate reported generalisation.
Strategic data acquisition. The novelty curve and persistent-homology voids identify where the next dataset should focus. Healthy skin in Fitzpatrick V–VI, pediatric cases, and rare nail pathologies are concrete acquisition targets where the atlas predicts the largest marginal coverage gain.
The complete SkinMap framework, including the model ensemble, the imputation engine, and the digital atlas, will be released open source alongside a live web app for clinical and research use.
Resources
- SkinMap. "A Global Atlas of Digital Dermatology to Map Innovation and Disparities". Under review. arXiv 2601.00840
- Source code and live digital atlas: open-source release pending acceptance.
BibTeX
@article{groger2025skinmap,
title = {A Global Atlas of Digital Dermatology to Map Innovation and Disparities},
author = {Gr{\"o}ger, Fabian and Lionetti, Simone and Gottfrois, Philippe and
Gonzalez-Jimenez, Alvaro and Habermacher, Lea and Amruthalingam, Ludovic and
Groh, Matthew and Pouly, Marc and Navarini, Alexander A. and
{Labelling Consortium}},
journal = {Under review},
year = {2025},
note = {arXiv preprint arXiv:2601.00840}
}