5M+ healthcare records across prescription digitisation, diagnostic reasoning, radiology, and pathology. Linked drug data adds symptom, disease, and side-effect grounding.
Not a narrow task set. A connected healthcare stack.
Capability
Prescription
Diagnostic
Reports
Drug data
Zstate Healthcare
Structured digitisation output
✓
✕
~
✕
✓
Clinical reasoning tasks
✕
✓
~
~
✓
Radiology interpretation
✕
~
✓
✕
✓
Pathology interpretation
✕
~
✓
✕
✓
Symptoms and disease context
~
✓
~
✓
✓
Side effects and drug metadata
✕
~
✕
✓
✓
Cross-task medication grounding
~
~
~
✓
✓
One corpus. Two layers.
5M+ rich healthcare corpus
A large healthcare dataset covering prescription digitisation, diagnostic reasoning, radiology report interpretation, and pathology report interpretation. Built to preserve real medical language, messy clinical inputs, and task-specific outputs.
Prescription extraction and normalization from difficult source material
Diagnostic reasoning tasks with richer disease and symptom context
Radiology and pathology report understanding instead of single-field labels
A strong drug dataset tied to symptoms, diseases, side effects, and medication entities. This turns the core corpus from isolated task data into a more connected substrate for training grounded medical systems.
Medication entities connected to symptoms and disease context
Side effects and drug attributes available as grounding signals
Better retrieval, evaluation, and reasoning around prescriptions and reports
Drug knowledgeSymptomsSide effects
Built for the healthcare tasks that actually connect.
Prescription digitisation
Messy prescription inputs turned into structured medication signals, dosage understanding, and normalized entities.
OCR cleanupNormalization
Diagnostic reasoning
Clinical reasoning tasks designed for models that need to connect symptoms, disease hypotheses, and medication context.
ReasoningClinical QA
Radiology interpretation
Report understanding that can support extraction, evaluation, and downstream medical AI workflows over imaging narratives.
ReportsImaging
Pathology interpretation
Pathology report coverage for systems that need to reason over findings, impressions, and disease-linked clinical language.
PathologyFindings
Drug knowledge grounding
Drug data linked to symptoms, diseases, and side effects that can ground the rest of the clinical corpus and complete the loop.
Drug dataSide effects
Linked subsets for training and eval
The page positions the corpus as modular: teams can work with one workflow family or package linked subsets that span extraction, reasoning, and grounding.
SFTEvaluation
Ready to see the healthcare dataset schema?
We can package a sample around prescription digitisation, diagnostic reasoning, report interpretation, and the linked drug context layer, depending on what your training pipeline needs first.