I bring 10+ years inside healthcare from payers, hospitals, and health tech to building the data systems and pipelines that help health organizations actually use their data. My field is data science, targeting Senior Data Analyst, ETL Developer, and Data Integration Specialist roles, with Data Engineer as my end goal.
Ten years inside healthcare taught me one thing clearly: the data was always there. The infrastructure to trust it wasn't.
I work with healthcare data to answer the questions that drive real decisions: validating claims, building cost and utilization analyses, tracking patient populations over time, calculating the quality metrics health plans are held accountable to, and delivering dashboards that clinical and operational teams actually use.
Underneath all of that, I build and maintain the pipelines that move raw data from source systems to clean it, connect it, and load it somewhere structured, reliable, and ready to use. Without that foundation, nothing downstream is accurate.
I didn't learn healthcare from the outside. I lived it revenue cycle, claims analytics, clinical operations, health tech. I know what the data is supposed to say before I ever touch it.
6 healthcare data projects spanning ETL pipelines, data warehousing, clinical ML, and population health dashboards built on real public datasets across payer, clinical, and public health domains.
Built a production-style data engineering pipeline ingesting adverse drug event reports from the openFDA API into an ICH E2B(R3)-aligned MySQL schema. Designed a normalized relational data model capturing drug, reaction, patient, and report entities. Surfaced drug safety signals through a Flask/Plotly interactive dashboard enabling exploration by drug class, reaction type, and report volume over time.
Developed a machine learning pipeline to classify clinically significant drug-drug interactions using the TwoSIDES pharmacovigilance dataset. Engineered features from adverse event co-occurrence patterns and trained logistic regression and random forest classifiers. Evaluated with AUC-ROC, precision-recall curves, and cross-validation with a focus on minimizing false negatives given clinical risk implications.
Built a cardiovascular disease risk prediction model using logistic regression across three public clinical datasets. Performed data harmonization, missing value imputation, and feature selection to produce a unified modeling dataset. Evaluated model calibration and discrimination across demographic subgroups to assess equity implications of risk score deployment in clinical settings.
Built a two-layer PostgreSQL data warehouse on real CMS Medicare claims data, a raw layer where data lands as-is and a clean layer where it's transformed and structured. Wrote a Python pipeline that downloads, cleans, and loads data automatically. Produced ten progressively complex SQL queries covering window functions, CTEs, and PMPM calculations. Final deliverable is a live Looker Studio dashboard with four panels: executive scorecard, PMPM trend, claims by service line, and top high-cost members with date and service line filters.
A fully automated three-stage Prefect pipeline that runs on a schedule — Extract pulls fresh CMS data, Transform applies data quality checks and logs bad records to an error table, Load pushes clean data incrementally into PostgreSQL so only new records are added each run. Includes retry logic for stage failures. A Plotly Dash monitoring dashboard surfaces pipeline run history, records processed per run, error rate over time, and a data quality summary panel — the view an ops team or data manager checks every morning.
Cloud-native data integration platform on Snowflake with three schemas — raw, staging, and analytics. A Python ingestion script loads CMS data into the raw schema via the Snowflake connector. SQL transformation scripts promote data through staging to analytics, cleaning, joining, and computing metrics at each layer. Snowflake Tasks automate transformations on a schedule. Three live Tableau Public dashboards serve as the front end: an executive scorecard with PMPM trend and cost variance, a clinical operations dashboard with readmission flags and ER utilization, and a payer analytics dashboard with denial rate by provider and utilization by service line.
Structured analyses of real-world healthcare challenges through an informatics, ethics, and policy lens.
Examines whether telemedicine can address access, cost, and continuity-of-care challenges in Jamaica particularly for rural and underserved communities while remaining aligned with health informatics principles.
Investigates how healthcare organizations can improve their cybersecurity posture to reduce the risk and impact of cyberattacks while protecting sensitive patient data and maintaining clinical operations.
Examines whether CDSS tools consistently prioritize patient care over financial interests as they become increasingly commercialized with analysis of IBM Watson for Oncology and Epic's sepsis model as real-world failures.
Built through graduate coursework, self-directed learning, and applied project work across the full healthcare data lifecycle.
A decade of healthcare data and operations, now focused on data analytics and engineering.
If you're building data infrastructure with real clinical impact, let's connect.