← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Data Engineering

Data Lake Architecture

Topic: Data Lake

Advertisement

Storing Raw Data

Data lake: repository for raw data.

Architecture

Ingestion: batch or streaming. Storage: object storage (S3, GCS). Processing: Spark, Presto.

Format: Parquet, ORC for analytics. Delta Lake adds ACID.

Patterns

Lakehouse: data warehouse + data lake. Data mesh: domain-oriented, federated. Data fabric: unified, connected.

Governance

Schema on read vs write. Catalog essential. Metadata layer important.

Key Takeaways

  1. Store raw data in native formats
  2. Lakehouse combines lake + warehouse
  3. Schema evolution needs management

Advertisement

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →