TaraJee Clarke — Health Data Engineer

About

Healthcare Ops Veteran
Turned Data Professional

Ten years inside healthcare taught me one thing clearly: the data was always there. The infrastructure to trust it wasn't.

I work with healthcare data to answer the questions that drive real decisions: validating claims, building cost and utilization analyses, tracking patient populations over time, calculating the quality metrics health plans are held accountable to, and delivering dashboards that clinical and operational teams actually use.

Underneath all of that, I build and maintain the pipelines that move raw data from source systems to clean it, connect it, and load it somewhere structured, reliable, and ready to use. Without that foundation, nothing downstream is accurate.

I didn't learn healthcare from the outside. I lived it revenue cycle, claims analytics, clinical operations, health tech. I know what the data is supposed to say before I ever touch it.

Portfolio

Selected Projects

6 healthcare data projects spanning ETL pipelines, data warehousing, clinical ML, and population health dashboards built on real public datasets across payer, clinical, and public health domains.

01 · Featured

Complete

Pharmacovigilance Data Engineering Pipeline

Built a production-style data engineering pipeline ingesting adverse drug event reports from the openFDA API into an ICH E2B(R3)-aligned MySQL schema. Designed a normalized relational data model capturing drug, reaction, patient, and report entities. Surfaced drug safety signals through a Flask/Plotly interactive dashboard enabling exploration by drug class, reaction type, and report volume over time.

PythonMySQLFlaskPlotlyopenFDA APIICH E2B(R3)

Type

Data Engineering

Standard

ICH E2B(R3) regulatory schema

Output

Interactive signal detection dashboard

Key Skills

REST API ingestion · Schema design · Regulatory standards · Dashboard development

Complete

Drug Interaction ML Classification Pipeline

Developed a machine learning pipeline to classify clinically significant drug-drug interactions using the TwoSIDES pharmacovigilance dataset. Engineered features from adverse event co-occurrence patterns and trained logistic regression and random forest classifiers. Evaluated with AUC-ROC, precision-recall curves, and cross-validation with a focus on minimizing false negatives given clinical risk implications.

100K+

Records Processed

ROC-AUC ~0.69

Model Performance

2 Models

LR + Random Forest

Pythonscikit-learnpandasTwoSIDESLogistic RegressionRandom Forest

Complete

Cardiovascular Disease Risk Prediction

Built a cardiovascular disease risk prediction model using logistic regression across three public clinical datasets. Performed data harmonization, missing value imputation, and feature selection to produce a unified modeling dataset. Evaluated model calibration and discrimination across demographic subgroups to assess equity implications of risk score deployment in clinical settings.

3 Datasets

Framingham · UCI · Cardio Train

Cross-validated

Generalizability Test

Equity Analysis

Subgroup Fairness

Pythonscikit-learnpandasLogistic RegressionModel Calibration

In Progress

Medicare Claims Data Warehouse & Analytics Pipeline

Built a two-layer PostgreSQL data warehouse on real CMS Medicare claims data, a raw layer where data lands as-is and a clean layer where it's transformed and structured. Wrote a Python pipeline that downloads, cleans, and loads data automatically. Produced ten progressively complex SQL queries covering window functions, CTEs, and PMPM calculations. Final deliverable is a live Looker Studio dashboard with four panels: executive scorecard, PMPM trend, claims by service line, and top high-cost members with date and service line filters.

2-Layer

Raw + Clean Warehouse

PMPM + CTEs

Advanced SQL

Looker Studio

Live Dashboard

PythonpandasPostgreSQLAdvanced SQLLooker StudioCMS Medicare

In Progress

Automated Healthcare ETL Pipeline with Orchestration & Monitoring

A fully automated three-stage Prefect pipeline that runs on a schedule — Extract pulls fresh CMS data, Transform applies data quality checks and logs bad records to an error table, Load pushes clean data incrementally into PostgreSQL so only new records are added each run. Includes retry logic for stage failures. A Plotly Dash monitoring dashboard surfaces pipeline run history, records processed per run, error rate over time, and a data quality summary panel — the view an ops team or data manager checks every morning.

3-Stage

Extract · Transform · Load

Prefect

Scheduled Orchestration

Dash Monitor

Live Pipeline Health

PythonpandasPrefectPostgreSQLPlotly DashIncremental Load

In Progress

End-to-End Healthcare Data Integration Platform on Snowflake

Cloud-native data integration platform on Snowflake with three schemas — raw, staging, and analytics. A Python ingestion script loads CMS data into the raw schema via the Snowflake connector. SQL transformation scripts promote data through staging to analytics, cleaning, joining, and computing metrics at each layer. Snowflake Tasks automate transformations on a schedule. Three live Tableau Public dashboards serve as the front end: an executive scorecard with PMPM trend and cost variance, a clinical operations dashboard with readmission flags and ER utilization, and a payer analytics dashboard with denial rate by provider and utilization by service line.

3-Schema

Raw · Staging · Analytics

Snowflake Tasks

Automated Transforms

3 Dashboards

Executive · Clinical · Payer

SnowflakePythonAdvanced SQLTableau PublicSnowflake TasksCMS Medicare

Case Studies

Health Informatics Analysis

Structured analyses of real-world healthcare challenges through an informatics, ethics, and policy lens.

Health Informatics & Strategy

Telemedicine as a Health Technology Solution in Jamaica

Examines whether telemedicine can address access, cost, and continuity-of-care challenges in Jamaica particularly for rural and underserved communities while remaining aligned with health informatics principles.

Telemedicine reduces geographic and transportation barriers, with highest impact in rural communities lacking specialist access

Continuity of care for chronic disease (diabetes, CVD) improves with remote monitoring and regular follow-up support

Effective adoption requires interoperable EHRs, digital infrastructure investment, and regulatory framework development

Telemedicine is a policy and change-management challenge as much as a technology one

TelemedicineHealth EquityInteroperabilityEHRCaribbean Health Systems

Health Informatics, Security & Privacy

Strengthening Cybersecurity in Healthcare Facilities

Investigates how healthcare organizations can improve their cybersecurity posture to reduce the risk and impact of cyberattacks while protecting sensitive patient data and maintaining clinical operations.

Phishing, ransomware, and insider threats are the dominant attack vectors often exploiting human behavior over technical gaps

EHR outages from cyberattacks directly delay patient care, making cybersecurity a patient-safety issue, not just an IT concern

NIST Cybersecurity Framework applied to structure recommendations across prevention, detection, response, and recovery

Multi-layered strategy combining access controls, staff training, and incident response planning reduces operational impact

CybersecurityNIST FrameworkPHI ProtectionRansomwareHealthcare IT

Health Informatics, Ethics & Clinical Systems

Profit vs. Care: Ethical Risks in Clinical Decision Support Systems

Examines whether CDSS tools consistently prioritize patient care over financial interests as they become increasingly commercialized with analysis of IBM Watson for Oncology and Epic's sepsis model as real-world failures.

Vendor financial relationships with pharma can shape CDSS algorithms toward high-cost treatments, often without clinician visibility

IBM Watson for Oncology recommended unsafe treatments due to narrow training data and lack of independent validation

Epic's sepsis model raised concerns about high false-alert rates and limited transparency into recommendation logic

Algorithmic transparency, independent audits, and cost-aware recommendations are essential for ethical CDSS design

CDSSResponsible AIAlgorithm TransparencyHealth EthicsClinical Governance

Experience

Career Timeline

A decade of healthcare data and operations, now focused on data analytics and engineering.

Jan 2025 – Present

Data Analyst Intern — AI & Quality Analytics

Northwell Health · Hybrid · New Hyde Park, NY

Evaluated and QA'd an LLM classifying internal healthcare topics for accuracy and workflow alignment. Identified misclassification patterns and delivered structured feedback to improve model performance. Contributed to model validation documentation and responsible AI governance frameworks.

Sept 2025 – Dec 2025

Health Informatics Intern

St. John's Episcopal Hospital · Hybrid · Far Rockaway, NY

Built SQL queries and Tableau dashboards for quality and KPI monitoring across hospital departments. Conducted workflow and performance analysis to identify operational improvement opportunities. Supported HIPAA compliance activities with direct EHR and health IT system exposure.

Oct 2023 – May 2024

Data Analyst — Claims & Operational Analytics

Alma · Remote · New York, NY

Engineered advanced SQL (CTEs, window functions) to validate high-volume claims data, reducing errors by 70%. Conducted cohort and PMPM analysis to surface cost leakage and close performance gaps. Built KPI monitoring datasets supporting operational optimization across service lines.

May 2023 – Oct 2023

Patient Intake & Clinical Analytics Coordinator

K Health · Remote · New York, NY

Analyzed intake and utilization data across Epic and Salesforce to assess workflow efficiency and patient throughput. Built SQL datasets tracking KPI stability and service-line performance trends. Conducted variance analysis to evaluate the operational impact of workflow and process changes.

Sept 2021 – May 2023

Practice Operations Coordinator

Life is Beautiful, MD · Fort Lauderdale, FL

Analyzed reimbursement and denial data to identify revenue drivers and operational bottlenecks. Developed KPI dashboards tracking revenue cycle performance and scheduling efficiency. Delivered data-driven insights that improved collections predictability and workflow performance.

Aug 2016 – Aug 2021

Data Analyst — Claims & Risk Analytics

Sagicor Life Insurance · Hybrid · Kingston, Jamaica

Analyzed multi-year claims datasets to identify cost, utilization, and risk trends for actuarial strategy. Built structured datasets supporting actuarial forecasting and long-term pricing models. Improved operational efficiency by 15% through performance monitoring and trend analysis.

Healthcare Data
Professional
→ Data Engineer

Healthcare Ops Veteran
Turned Data Professional

Selected Projects

Pharmacovigilance Data Engineering Pipeline

Drug Interaction ML Classification Pipeline

Cardiovascular Disease Risk Prediction

Medicare Claims Data Warehouse & Analytics Pipeline

Automated Healthcare ETL Pipeline with Orchestration & Monitoring

End-to-End Healthcare Data Integration Platform on Snowflake

Health Informatics Analysis

Telemedicine as a Health Technology Solution in Jamaica

Strengthening Cybersecurity in Healthcare Facilities

Profit vs. Care: Ethical Risks in Clinical Decision Support Systems

Stack & Expertise

Career Timeline

Let's Work Together

Healthcare DataProfessional→ Data Engineer

Healthcare Ops VeteranTurned Data Professional

Selected Projects

Pharmacovigilance Data Engineering Pipeline

Drug Interaction ML Classification Pipeline

Cardiovascular Disease Risk Prediction

Medicare Claims Data Warehouse & Analytics Pipeline

Automated Healthcare ETL Pipeline with Orchestration & Monitoring

End-to-End Healthcare Data Integration Platform on Snowflake

Health Informatics Analysis

Telemedicine as a Health Technology Solution in Jamaica

Strengthening Cybersecurity in Healthcare Facilities

Profit vs. Care: Ethical Risks in Clinical Decision Support Systems

Stack & Expertise

Career Timeline

Let's Work Together

Healthcare Data
Professional
→ Data Engineer

Healthcare Ops Veteran
Turned Data Professional