Data Engineer

Building scalable data infrastructure and quality frameworks that power analytics at enterprise scale

About Me

I'm a Data Engineer with 3 years of experience building production-grade data platforms on GCP. Currently at HSBC, I architect end-to-end data quality frameworks that validate over 1.3 trillion records and enable reliable analytics for regulatory compliance and business insights.

My passion lies in designing scalable data infrastructure that bridges the gap between raw data and actionable intelligence. From building robust ETL pipelines to implementing real-time streaming architectures, I focus on creating solutions that are reliable, performant, and maintainable.

1.36T+

Records Validated

70+

Data Assets

60%

Performance Improvement

500+

Hours Saved Monthly

My Journey

Academic Foundation in Robotics & AI

2018 - 2022

My journey began with a Bachelor's in Robotics Engineering (First Class Honours) from the University of Plymouth, followed by an MSc in Robotics and Artificial Intelligence from the University of Glasgow. During this time, I built a strong foundation in machine learning, deep learning, computer vision, and algorithm design - skills that would prove invaluable in my data engineering career.

Breaking into Data Engineering

June 2023 - December 2023

I started my professional journey at Digital Futures as a Data Engineering Associate, where I developed production-grade PySpark ETL pipelines and deployed infrastructure using Terraform. This role taught me the fundamentals of building reliable data pipelines, implementing CI/CD workflows, and enforcing data governance practices. I also conducted 25+ technical interviews, which deepened my understanding of what makes a strong data engineer.

Building Enterprise Data Quality at HSBC

January 2024 - Present

At HSBC's ESG department, I took on the challenge of architecting the entire Data Quality platform from the ground up. I researched, evangelized, and secured approval for Great Expectations as our enterprise solution, then designed and built the framework that now validates 1.36 trillion records across 70+ datasets.

The journey involved building cloud-native architectures on GCP, implementing Kafka-based streaming for real-time metrics, optimizing Spark workflows to reduce runtime by 60%, and collaborating with cross-functional teams to establish data governance standards. This experience taught me that impactful data engineering isn't just about writing code - it's about understanding business needs, advocating for the right solutions, and delivering measurable results.

Looking Forward

2025 and Beyond

I'm passionate about continuing to build scalable data infrastructure that empowers teams to make data-driven decisions. Whether it's optimizing pipeline performance, implementing modern data stacks, or mentoring engineers, I'm driven by the challenge of solving complex data problems at scale.

Technical Skills

Languages & Frameworks

  • Python
  • SQL
  • PySpark
  • Scala
  • Java

Data Engineering

  • Apache Spark (3.x)
  • Airflow / Cloud Composer (2.x)
  • ETL/ELT Pipelines
  • Data Modeling
  • Great Expectations
  • Delta Lake

Cloud & Infrastructure

  • GCP (BigQuery, Dataproc, Cloud Storage)
  • Terraform (Infrastructure as Code)
  • Kafka
  • CloudSQL
  • AWS (Basic)

DevOps & Automation

  • Jenkins CI/CD
  • Maven & Nexus
  • Git
  • Automated Testing
  • Monitoring & Observability

Data Governance

  • GDPR Compliance
  • Data Quality Frameworks
  • Metadata Management
  • Data Security Best Practices
  • Validation & Testing

Collaboration

  • Agile Methodologies
  • Cross-functional Teamwork
  • Technical Documentation
  • Stakeholder Engagement
  • Mentoring Engineers