Hi, I'm Hayatu Abdullahi

Data Engineer

I build scalable data infrastructure and quality frameworks that transform raw data into reliable analytics. With 3 years of experience on GCP, I've architected platforms validating over 1.3 trillion records and enabling data-driven decision making at enterprise scale.

Making Data Work at Scale

I specialize in designing and building production-grade data platforms that handle massive scale while maintaining reliability, performance, and data quality.

1.36T+

Records Validated

70+

Data Assets Managed

60%

Performance Improvement

500+

Engineering Hours Saved Monthly

Experience

Building data infrastructure that powers analytics and drives business decisions

Specialist Data Engineer

HSBC - ESG Data Quality
January 2024 - Present | London, UK

Architected and built HSBC's ESG Data Quality framework from the ground up with the help of my team, establishing the technical foundation for enterprise-wide data governance and validation.

  • Designed end-to-end data quality framework using Great Expectations, Python, and Spark on GCP – validating 1.36T+ records across 70+ datasets
  • Cut heavy-job runtimes by ~60% via partitioning and shuffle hygiene on 250M+ row assets
  • Real-time DQ monitoring with Kafka + CloudSQL metadata, reducing investigation time by ~40%
  • GCP infra (BigQuery, Dataproc, Composer) with CI/CD via Jenkins
  • Partnered across analysts/product/architecture to embed governance

Data Engineering Associate

Digital Futures
June 2023 - December 2023 | London, UK

Built production-grade pipelines and infra across client envs with modern DE practices.

  • PySpark ETL pipelines: +30% throughput, ~99% accuracy
  • Terraform + CI/CD deployments
  • GDPR-aligned governance and security
  • 25+ tech interviews; mentored juniors on Python/SQL/pipeline design

Technical Expertise

A comprehensive toolkit for building scalable, reliable data platforms

🐍 Languages

Python, SQL, Scala, PySpark – production-grade DE

☁️ GCP

BigQuery, Dataproc, GCS, Composer, CloudSQL

⚙️ Processing

Spark (3.x), Airflow, ETL/ELT, Delta

🏗️ IaC

Terraform, Jenkins, Maven, Git

✅ Data Quality

Great Expectations, custom validations

📊 Architecture

Modeling, warehouse design, batch/stream

🔄 Real-time

Kafka, event-driven

🔒 Governance

GDPR, security, observability

Let's Build Something Together

I'm always interested in discussing data engineering challenges and opportunities. Whether you're building new infra or optimizing pipelines, let's connect.