Does Serverless, ETL and ELT
Crawls data sources and generates Data Catalog
Sources:
- Stores: AWS S3 Simple Storage Service, AWS RDS, any JDBC compatible DB like AWS Redshift & AWS DynamoDB
- Streams: Data Streams & Apache Kafka
Targets: - AWS S3 Simple Storage Service, AWS RDS, any JDBC compatible DB
Data Catalog
IMPORTANT
a Crawler can creates metadata in Data Catalog
A Glue job is used to perform ETL by scavenging from Data Catalog, jobs can be Serverless or manual
Persistent storage of metadata about sources within a region.
One catalog per region per account
- Avoids data silos (this way: improves visibility, makes data structure browsable and clutterness)
Amazon Athena, Spectrum, AWS EMR & AWS Lake Formation all use Data Catalog
.. data discovered by crawlers by giving them credentials and pointing at sources.
/Attachments/Pasted-image-20230215023559.png)