Welcome to My Data Engineering Journey
Hello and welcome to my blog! I’m Salah Algamasy, a passionate Data Engineer and Computer Engineering student with a deep fascination for transforming raw data into meaningful insights.
Academic Background
I am currently pursuing a Bachelor’s in Computer Engineering (2020-2025), where I have developed a rigorous foundation in:
- Data Structures & Algorithms: Optimizing computational efficiency in data processing.
- Distributed Computing: Engineering systems for large-scale data management.
- Database Systems: Designing high-performance storage and retrieval architectures.
- Infrastructure & Systems: Understanding the low-level environments that power data pipelines.
Professional Experience
My technical journey has been defined by focused professional development and hands-on experience in complex data environments:
Data Engineer Intern | DataLoops
May 2025 - Present
Contributing to the architecture of scalable data pipelines and implementing advanced data management practices across diverse technical environments.
Big Data Specialization | Orange Digital Center
August 2024 - September 2024
Advanced training in enterprise-level big data technologies, focusing on containerization and distributed processing:
- Environment Management: Docker/Containerization
- Distributed Computing: Hadoop & Apache Spark
- Data Warehousing: Apache Hive
- Automation: Apache NiFi
- Database Systems: PostgreSQL & MongoDB
Professional Development | NTI & DEPI
February 2024 - September 2024
Intensive technical specialization in ETL architecture and warehouse implementation:
- Pipeline Architecture: ETL processes using SSIS
- Warehouse Design: Architecture and implementation strategies
- Engineering Tools: Python, MS SQL, and Apache Spark
Featured Engineering Projects
I focus on developing end-to-end solutions that address complex data scalability challenges:
Real-Time Analytics Pipeline
Architected a streaming data pipeline using Kafka, Spark Streaming, and Cassandra. Implemented orchestration through Apache Airflow and managed transformations with dbt for real-time analytical consistency.
Distributed Data Warehouse
Designed a multi-format warehousing solution supporting OLTP and OLAP workloads. Optimized for Parquet and Avro formats using Spark, Hive, and Trino.
ML Lifecycle Automation
Developed a comprehensive data lifecycle project integrating automated ETL, MLflow for experiment tracking, and FastAPI for production deployment.
Technical Proficiency
Distributed Processing: Apache Spark, Hadoop, Hive, Kafka
Storage Architecture: PostgreSQL, MongoDB, MySQL, Cassandra
Pipeline Orchestration: Apache Airflow, Apache NiFi, SSIS, dbt
Engineering Environment: Python, SQL, Bash, Docker, Git
Core Objectives
My work is centered on three primary pillars of Data Engineering:
- Strategic Architecture: Building resilient pipelines capable of industrial scale.
- Knowledge Transmission: Contributing technical insights to the engineering community.
- Technological Advancment: Implementing state-of-the-art solutions in distributed computing.
Professional Connection
I am always keen to engage with fellow engineers and professionals in the data space. Whether discussing distributed system design or pipeline optimization, I look forward to contributing to the broader technical dialogue.
Thank you for joining me on this journey. Stay tuned for deep technical content, project showcases, and insights from the ever-evolving world of data engineering!