Luna Tech

Tutorials For Dummies.

What does a Data Engineer do?

2021-09-25


1. Why do we need Data Engineers?

Reference

Goal:

Input:

Output:

Process:


2. Data Storage Optimizations

Data Warehouse

Stores structured data from different sources in a way that suits analytics purpose.

Data Lake

Stores all the raw data.

ETL will become ELT:

Data Engineer will do the EL part and Data Scientists do the T part.


3. Data Scientist

Data Engineer needs to create Custom ETL (ad-hoc task) and provide Data Lake (raw data) for data scientists.


4. Big Data

Characteristics

Velocity - Data Streaming

Related to the ETL’s Exact part, in big data world, new data is generated in real-time.

In tranditional ETL, we were fetching batch data from source through API requests, this is called Synchronous Communication.

In Big Data world, we need to use Asynchronous Communication by adopting the pub-sub pattern.

Common technologies:

Volume - Distributed Storage and Computing

Common technologies:


5. Summary

(Big) Data Engineer

Works with ETL/ELT processes to consume data from different data sources and load into Data Warehouse and data lake for business usage.

The design of data warehouse should be suitable for the end users (Data Analysts, Data Scientists, Machine Learning Engineer, etc).

Data Scientists

Consume data from Data Warehouse and Data Lake, develop model and make predictions.

Data Analysts

Consume data from BI interface (linked to Data Warehouse) and develop reports.

Machine Learning Engineer

Make use of the output of ETL and produce some real-time recommendations for the user.