Does Serverless, ETL and ELT
Crawls data sources and generates Data Catalog

Sources:

Data Catalog

IMPORTANT

a Crawler can creates metadata in Data Catalog

A Glue job is used to perform ETL by scavenging from Data Catalog, jobs can be Serverless or manual

Persistent storage of metadata about sources within a region.
One catalog per region per account

  • Avoids data silos (this way: improves visibility, makes data structure browsable and clutterness)
    Amazon Athena, Spectrum, AWS EMR & AWS Lake Formation all use Data Catalog
    .. data discovered by crawlers by giving them credentials and pointing at sources.