Balancing Act

Simplicity vs. Scalability

Oct 01, 2024

Kelsey Grammer Simplicity GIF by Paramount+

As a data engineer/architect/director, I aim to create simple, efficient, and scalable data pipelines and architectures. “Simple is Better“. However, with the abundance of tools and abstractions in today's data ecosystem, achieving true simplicity can be challenging.

For instance, when designing a data lake architecture, using the latest technologies like Delta Lake, Iceberg ( my current favourite), or Hudi for all use cases is tempting. However, sometimes a simple partitioned parquet structure might suffice for many datasets, reducing complexity and maintenance overhead with a well-thought-out AWS S3 structure—it can be a pretty solid combo.

s3://your-data-lake-bucket/
├── raw/
│   ├── source1/
│   │   ├── YYYY/
│   │   │   ├── MM/
│   │   │   │   ├── DD/
│   │   │   │   │   └── data_files...

We have powerful tools like Airflow, Spark, and dbt that abstract away much of the complexity. While these tools are good, understanding their underlying mechanisms to avoid creating a 'black box' architecture that's difficult to debug and optimize - so resist adding this complexity until there is a real need; Simple is Better.

DataEngineer.io Newsletter

The Hard Truth About Bad Data Models

Data models describe businesses and reveal patterns that would otherwise go unnoticed. The power here is the promise of not flying in the dark. The problem is many people build planes with one wing, planes without fuel, or planes without a pilot…

10 months ago · 10 likes · Zach Wilson

Elegant design and nimble data structures prevent future headaches.

Quality work brings rewards, such as the satisfaction of a job well done.

That is all for today -

Self Taught Engineer

Balancing Act

Simplicity vs. Scalability