Healthcare Data
Professional
→ Data Engineer

I bring 10+ years inside healthcare from payers, hospitals, and health tech to building the data systems and pipelines that help health organizations actually use their data. My field is data science, targeting Senior Data Analyst, ETL Developer, and Data Integration Specialist roles, with Data Engineer as my end goal.

🏥
Domain Experience
10+ yrs · Payers, Hospitals, Health Tech, Clinical Practices
🎓
Education
MS Health Informatics — Hofstra University
2026
🔬
Current Internship
Northwell Health — LLM Evaluation & Clinical AI
⚙️
Focus
Data Analytics · ETL Pipelines · Health Data
TaraJee Clarke

Healthcare Ops Veteran
Turned Data Professional

Ten years inside healthcare taught me one thing clearly: the data was always there. The infrastructure to trust it wasn't.

I work with healthcare data to answer the questions that drive real decisions: validating claims, building cost and utilization analyses, tracking patient populations over time, calculating the quality metrics health plans are held accountable to, and delivering dashboards that clinical and operational teams actually use.

Underneath all of that, I build and maintain the pipelines that move raw data from source systems to clean it, connect it, and load it somewhere structured, reliable, and ready to use. Without that foundation, nothing downstream is accurate.

I didn't learn healthcare from the outside. I lived it revenue cycle, claims analytics, clinical operations, health tech. I know what the data is supposed to say before I ever touch it.

Location
New York — open to relocate
Open To
Remote · Hybrid
Available For
Senior Data Analyst · ETL Developer · Data Integration Specialist
Prior Experience Spans
Claims analytics, Clinical operations, Health IT
Industries
Payers · Hospitals · Health Tech Startups
Leadership
VP, Health Technology Analytics & Innovation — Hofstra HTAi
Portfolio

Selected Projects

6 healthcare data projects spanning ETL pipelines, data warehousing, clinical ML, and population health dashboards built on real public datasets across payer, clinical, and public health domains.

02
Complete

Drug Interaction ML Classification Pipeline

Developed a machine learning pipeline to classify clinically significant drug-drug interactions using the TwoSIDES pharmacovigilance dataset. Engineered features from adverse event co-occurrence patterns and trained logistic regression and random forest classifiers. Evaluated with AUC-ROC, precision-recall curves, and cross-validation with a focus on minimizing false negatives given clinical risk implications.

100K+
Records Processed
ROC-AUC ~0.69
Model Performance
2 Models
LR + Random Forest
Pythonscikit-learnpandasTwoSIDESLogistic RegressionRandom Forest
03
Complete

Cardiovascular Disease Risk Prediction

Built a cardiovascular disease risk prediction model using logistic regression across three public clinical datasets. Performed data harmonization, missing value imputation, and feature selection to produce a unified modeling dataset. Evaluated model calibration and discrimination across demographic subgroups to assess equity implications of risk score deployment in clinical settings.

3 Datasets
Framingham · UCI · Cardio Train
Cross-validated
Generalizability Test
Equity Analysis
Subgroup Fairness
Pythonscikit-learnpandasLogistic RegressionModel Calibration
04
In Progress

Medicare Claims Data Warehouse & Analytics Pipeline

Built a two-layer PostgreSQL data warehouse on real CMS Medicare claims data, a raw layer where data lands as-is and a clean layer where it's transformed and structured. Wrote a Python pipeline that downloads, cleans, and loads data automatically. Produced ten progressively complex SQL queries covering window functions, CTEs, and PMPM calculations. Final deliverable is a live Looker Studio dashboard with four panels: executive scorecard, PMPM trend, claims by service line, and top high-cost members with date and service line filters.

2-Layer
Raw + Clean Warehouse
PMPM + CTEs
Advanced SQL
Looker Studio
Live Dashboard
PythonpandasPostgreSQLAdvanced SQLLooker StudioCMS Medicare
05
In Progress

Automated Healthcare ETL Pipeline with Orchestration & Monitoring

A fully automated three-stage Prefect pipeline that runs on a schedule — Extract pulls fresh CMS data, Transform applies data quality checks and logs bad records to an error table, Load pushes clean data incrementally into PostgreSQL so only new records are added each run. Includes retry logic for stage failures. A Plotly Dash monitoring dashboard surfaces pipeline run history, records processed per run, error rate over time, and a data quality summary panel — the view an ops team or data manager checks every morning.

3-Stage
Extract · Transform · Load
Prefect
Scheduled Orchestration
Dash Monitor
Live Pipeline Health
PythonpandasPrefectPostgreSQLPlotly DashIncremental Load
06
In Progress

End-to-End Healthcare Data Integration Platform on Snowflake

Cloud-native data integration platform on Snowflake with three schemas — raw, staging, and analytics. A Python ingestion script loads CMS data into the raw schema via the Snowflake connector. SQL transformation scripts promote data through staging to analytics, cleaning, joining, and computing metrics at each layer. Snowflake Tasks automate transformations on a schedule. Three live Tableau Public dashboards serve as the front end: an executive scorecard with PMPM trend and cost variance, a clinical operations dashboard with readmission flags and ER utilization, and a payer analytics dashboard with denial rate by provider and utilization by service line.

3-Schema
Raw · Staging · Analytics
Snowflake Tasks
Automated Transforms
3 Dashboards
Executive · Clinical · Payer
SnowflakePythonAdvanced SQLTableau PublicSnowflake TasksCMS Medicare

Health Informatics Analysis

Structured analyses of real-world healthcare challenges through an informatics, ethics, and policy lens.

Health Informatics & Strategy

Telemedicine as a Health Technology Solution in Jamaica

Examines whether telemedicine can address access, cost, and continuity-of-care challenges in Jamaica particularly for rural and underserved communities while remaining aligned with health informatics principles.

Telemedicine reduces geographic and transportation barriers, with highest impact in rural communities lacking specialist access
Continuity of care for chronic disease (diabetes, CVD) improves with remote monitoring and regular follow-up support
Effective adoption requires interoperable EHRs, digital infrastructure investment, and regulatory framework development
Telemedicine is a policy and change-management challenge as much as a technology one
TelemedicineHealth EquityInteroperabilityEHRCaribbean Health Systems
Health Informatics, Security & Privacy

Strengthening Cybersecurity in Healthcare Facilities

Investigates how healthcare organizations can improve their cybersecurity posture to reduce the risk and impact of cyberattacks while protecting sensitive patient data and maintaining clinical operations.

Phishing, ransomware, and insider threats are the dominant attack vectors often exploiting human behavior over technical gaps
EHR outages from cyberattacks directly delay patient care, making cybersecurity a patient-safety issue, not just an IT concern
NIST Cybersecurity Framework applied to structure recommendations across prevention, detection, response, and recovery
Multi-layered strategy combining access controls, staff training, and incident response planning reduces operational impact
CybersecurityNIST FrameworkPHI ProtectionRansomwareHealthcare IT
Health Informatics, Ethics & Clinical Systems

Profit vs. Care: Ethical Risks in Clinical Decision Support Systems

Examines whether CDSS tools consistently prioritize patient care over financial interests as they become increasingly commercialized with analysis of IBM Watson for Oncology and Epic's sepsis model as real-world failures.

Vendor financial relationships with pharma can shape CDSS algorithms toward high-cost treatments, often without clinician visibility
IBM Watson for Oncology recommended unsafe treatments due to narrow training data and lack of independent validation
Epic's sepsis model raised concerns about high false-alert rates and limited transparency into recommendation logic
Algorithmic transparency, independent audits, and cost-aware recommendations are essential for ethical CDSS design
CDSSResponsible AIAlgorithm TransparencyHealth EthicsClinical Governance
Technical Skills

Stack & Expertise

Built through graduate coursework, self-directed learning, and applied project work across the full healthcare data lifecycle.

Languages & Querying
  • SQL (Advanced — CTEs, Window Functions, PMPM)
  • Python
  • pandas / NumPy
ETL & Data Engineering
  • ETL / ELT Pipeline Design
  • Prefect (Orchestration)
  • Incremental Loading
  • REST API Integration
  • Data Quality Testing
  • Batch Pipeline Architecture
Databases & Warehousing
  • PostgreSQL / MySQL
  • Snowflake
  • DuckDB
  • Multi-layer Warehouse Design
  • Schema Design
Visualization & Dashboards
  • Tableau Public
  • Looker Studio
  • Plotly / Plotly Dash
  • Streamlit
  • Flask
  • KPI & Scorecard Design
Healthcare Domain
  • CMS Medicare / Medicaid Data
  • Claims Analytics & PMPM
  • ICD-10 / CPT / NPI / HEDIS
  • EHR / Health IT Systems
  • ICH E2B(R3)
  • SDOH Data Integration
Tools & Workflow
  • Git / GitHub
  • Docker / Docker Compose
  • VS Code
  • Jupyter Notebooks
  • scikit-learn (ML)
  • LLM Evaluation & AI QA
Experience

Career Timeline

A decade of healthcare data and operations, now focused on data analytics and engineering.

Jan 2025 – Present
Data Analyst Intern — AI & Quality Analytics
Northwell Health · Hybrid · New Hyde Park, NY
Evaluated and QA'd an LLM classifying internal healthcare topics for accuracy and workflow alignment. Identified misclassification patterns and delivered structured feedback to improve model performance. Contributed to model validation documentation and responsible AI governance frameworks.
Sept 2025 – Dec 2025
Health Informatics Intern
St. John's Episcopal Hospital · Hybrid · Far Rockaway, NY
Built SQL queries and Tableau dashboards for quality and KPI monitoring across hospital departments. Conducted workflow and performance analysis to identify operational improvement opportunities. Supported HIPAA compliance activities with direct EHR and health IT system exposure.
Oct 2023 – May 2024
Data Analyst — Claims & Operational Analytics
Alma · Remote · New York, NY
Engineered advanced SQL (CTEs, window functions) to validate high-volume claims data, reducing errors by 70%. Conducted cohort and PMPM analysis to surface cost leakage and close performance gaps. Built KPI monitoring datasets supporting operational optimization across service lines.
May 2023 – Oct 2023
Patient Intake & Clinical Analytics Coordinator
K Health · Remote · New York, NY
Analyzed intake and utilization data across Epic and Salesforce to assess workflow efficiency and patient throughput. Built SQL datasets tracking KPI stability and service-line performance trends. Conducted variance analysis to evaluate the operational impact of workflow and process changes.
Sept 2021 – May 2023
Practice Operations Coordinator
Life is Beautiful, MD · Fort Lauderdale, FL
Analyzed reimbursement and denial data to identify revenue drivers and operational bottlenecks. Developed KPI dashboards tracking revenue cycle performance and scheduling efficiency. Delivered data-driven insights that improved collections predictability and workflow performance.
Aug 2016 – Aug 2021
Data Analyst — Claims & Risk Analytics
Sagicor Life Insurance · Hybrid · Kingston, Jamaica
Analyzed multi-year claims datasets to identify cost, utilization, and risk trends for actuarial strategy. Built structured datasets supporting actuarial forecasting and long-term pricing models. Improved operational efficiency by 15% through performance monitoring and trend analysis.
Contact

Let's Work Together

If you're building data infrastructure with real clinical impact, let's connect.