❯

❯

❯

AWS Glue

Jun 20, 20251 min read

Does Serverless, ETL and ELT
Crawls data sources and generates Data Catalog

Sources:

Stores: AWS S3 Simple Storage Service, AWS RDS, any JDBC compatible DB like AWS Redshift & AWS DynamoDB
Streams: Data Streams & Apache Kafka
Targets:
AWS S3 Simple Storage Service, AWS RDS, any JDBC compatible DB

Data Catalog

IMPORTANT

a Crawler can creates metadata in Data Catalog

A Glue job is used to perform ETL by scavenging from Data Catalog, jobs can be Serverless or manual

Persistent storage of metadata about sources within a region.
One catalog per region per account

Avoids data silos (this way: improves visibility, makes data structure browsable and clutterness)
Amazon Athena, Spectrum, AWS EMR & AWS Lake Formation all use Data Catalog
.. data discovered by crawlers by giving them credentials and pointing at sources.

Graph View

Created with Quartz v4.5.1 © 2025

GitHub
Discord Community