Welcome to My Data Engineering Journey

Hello and welcome to my blog! I’m Salah Algamasy, a passionate Data Engineer and Computer Engineering student with a deep fascination for transforming raw data into meaningful insights.

Academic Background

I am currently pursuing a Bachelor’s in Computer Engineering (2020-2025), where I have developed a rigorous foundation in:

Data Structures & Algorithms: Optimizing computational efficiency in data processing.
Distributed Computing: Engineering systems for large-scale data management.
Database Systems: Designing high-performance storage and retrieval architectures.
Infrastructure & Systems: Understanding the low-level environments that power data pipelines.

Professional Experience

My technical journey has been defined by focused professional development and hands-on experience in complex data environments:

Data Engineer Intern | DataLoops

May 2025 - Present

Contributing to the architecture of scalable data pipelines and implementing advanced data management practices across diverse technical environments.

Big Data Specialization | Orange Digital Center

August 2024 - September 2024

Advanced training in enterprise-level big data technologies, focusing on containerization and distributed processing:

Environment Management: Docker/Containerization
Distributed Computing: Hadoop & Apache Spark
Data Warehousing: Apache Hive
Automation: Apache NiFi
Database Systems: PostgreSQL & MongoDB

Professional Development | NTI & DEPI

February 2024 - September 2024

Intensive technical specialization in ETL architecture and warehouse implementation:

Pipeline Architecture: ETL processes using SSIS
Warehouse Design: Architecture and implementation strategies
Engineering Tools: Python, MS SQL, and Apache Spark

Featured Engineering Projects

I focus on developing end-to-end solutions that address complex data scalability challenges:

Real-Time Analytics Pipeline

Architected a streaming data pipeline using Kafka, Spark Streaming, and Cassandra. Implemented orchestration through Apache Airflow and managed transformations with dbt for real-time analytical consistency.

Distributed Data Warehouse

Designed a multi-format warehousing solution supporting OLTP and OLAP workloads. Optimized for Parquet and Avro formats using Spark, Hive, and Trino.

ML Lifecycle Automation

Developed a comprehensive data lifecycle project integrating automated ETL, MLflow for experiment tracking, and FastAPI for production deployment.

Technical Proficiency

Distributed Processing: Apache Spark, Hadoop, Hive, Kafka
Storage Architecture: PostgreSQL, MongoDB, MySQL, Cassandra
Pipeline Orchestration: Apache Airflow, Apache NiFi, SSIS, dbt
Engineering Environment: Python, SQL, Bash, Docker, Git

Core Objectives

My work is centered on three primary pillars of Data Engineering:

Strategic Architecture: Building resilient pipelines capable of industrial scale.
Knowledge Transmission: Contributing technical insights to the engineering community.
Technological Advancment: Implementing state-of-the-art solutions in distributed computing.

Professional Connection

I am always keen to engage with fellow engineers and professionals in the data space. Whether discussing distributed system design or pipeline optimization, I look forward to contributing to the broader technical dialogue.

Thank you for joining me on this journey. Stay tuned for deep technical content, project showcases, and insights from the ever-evolving world of data engineering!