Clinical data that connects the full workflow.

5M+ healthcare records across prescription digitisation, diagnostic reasoning, radiology, and pathology. Linked drug data adds symptom, disease, and side-effect grounding.

Not a narrow task set.
A connected healthcare stack.
Capability Prescription Diagnostic Reports Drug data Zstate Healthcare
Structured digitisation output ~
Clinical reasoning tasks ~ ~
Radiology interpretation ~
Pathology interpretation ~
Symptoms and disease context ~ ~
Side effects and drug metadata ~
Cross-task medication grounding ~ ~ ~

One corpus.
Two layers.

5M+ rich healthcare corpus

A large healthcare dataset covering prescription digitisation, diagnostic reasoning, radiology report interpretation, and pathology report interpretation. Built to preserve real medical language, messy clinical inputs, and task-specific outputs.

  • Prescription extraction and normalization from difficult source material
  • Diagnostic reasoning tasks with richer disease and symptom context
  • Radiology and pathology report understanding instead of single-field labels
5M+ records Clinical workflows Report interpretation

Drug context that completes the dataset

A strong drug dataset tied to symptoms, diseases, side effects, and medication entities. This turns the core corpus from isolated task data into a more connected substrate for training grounded medical systems.

  • Medication entities connected to symptoms and disease context
  • Side effects and drug attributes available as grounding signals
  • Better retrieval, evaluation, and reasoning around prescriptions and reports
Drug knowledge Symptoms Side effects

Built for the healthcare tasks that actually connect.

Prescription digitisation

Messy prescription inputs turned into structured medication signals, dosage understanding, and normalized entities.

OCR cleanup Normalization

Diagnostic reasoning

Clinical reasoning tasks designed for models that need to connect symptoms, disease hypotheses, and medication context.

Reasoning Clinical QA

Radiology interpretation

Report understanding that can support extraction, evaluation, and downstream medical AI workflows over imaging narratives.

Reports Imaging

Pathology interpretation

Pathology report coverage for systems that need to reason over findings, impressions, and disease-linked clinical language.

Pathology Findings

Drug knowledge grounding

Drug data linked to symptoms, diseases, and side effects that can ground the rest of the clinical corpus and complete the loop.

Drug data Side effects

Linked subsets for training and eval

The page positions the corpus as modular: teams can work with one workflow family or package linked subsets that span extraction, reasoning, and grounding.

SFT Evaluation

Ready to see the healthcare dataset schema?

We can package a sample around prescription digitisation, diagnostic reasoning, report interpretation, and the linked drug context layer, depending on what your training pipeline needs first.

Request schema + sample Discuss a custom subset