dataeng

Uses drivers for communication with external tools, supports ODBC and JDBC and connectors like Apache Spark or python.

This article will use some images that belong to: Data Camp - Learn Data Science and AI Online | DataCamp

Flavor

Has its own flavor of SQL

JSON Processing

  • Postgres: Uses JSONB
  • Snowflake: Uses VARIANT
    • Supports both OBJECT and ARRAY data types:
      • OBJECT: { "key": "value" } (key-value pairs)
      • ARRAY: ["list", "of", "values"] (ordered list)
      • Creating a Table for JSON Data
CREATE TABLE cust_info.json_data (
    customer_id INT,
    customer_info VARIANT  -- VARIANT column for semi-structured data
);

PARSE_JSON(expr)

  • expr: A string containing JSON-formatted data
    Returns
  • VARIANT type containing the parsed JSON object
  • Returns NULL if the input string is not valid JSON
SELECT PARSE_JSON(
    -- JSON string enclosed in single quotes
    '{
        "cust_id": 1,
        "cust_name": "cust1",
        "cust_age": 40,
        "cust_email": "cust1***@gmail.com"
    }'
) AS customer_info_json;
KeyValueData Type
cust_age40Integer
cust_emailcust1***@gmail.comString
cust_id1Integer
cust_namecust1String
To create JSON object, use OBJECT_CONSTRUCT.
GROUP BY ALL groups by all.

NUMBER(p, s) - similar to NUMERIC has precision and scale. NUMERIC is aliased it.
TIMESTAMP_LTZ
CAST or :: works the same

Has conversion functions like TO_VARCHAR converts numeric or timestamps into string format, TO_DATE

Alternatively to this:

select column_name, data_type, character_maximum_length, column_default, is_nullable
from INFORMATION_SCHEMA.COLUMNS where table_name = '<name of table>';

We can do this: DESC TABLE orders

INITCAP(column) to capitalize the beginning of a cell.
CURRENT_DATE and CURRENT_TIME for current timestamp
EXTRACT to extract specific part of timestamp.

SELECT COUNT(*AS orders_per_day, 
    EXTRACT(weekday FROM order_date) AS order_day
FROM orders
GROUP BY order_day
ORDER BY orders_per_day DESC


NATURAL JOIN - Removed duplicate columns, does not require ON clause

LATERAL JOIN - Allows subquery to reference columns in preceding tables. In this subquery JOIN inside subquery is not needed

Optimization

  • UNION or UNION ALL: Know the difference

    • UNION removes duplicates, slows down the query
    • UNION ALL is faster if no duplicates
  • Handling big data

    • Use filters to narrow down data
    • Apply limits for quicker results


Apply filtering beforehand, this one is wrong application:

SELECT orders.order_id,
       orders.order_date,
       pizza_type.name,
       pizzas.pizza_size
FROM orders
JOIN order_details
ON orders.order_id = order_details.order_id
JOIN pizzas
ON order_details.pizza_id = pizzas.pizza_id
JOIN pizza_type
ON pizzas.pizza_type_id = pizza_type.pizza_type_id
WHERE orders.order_date = '2015-01-01';  -- Filtering after JOIN
 

And this one is good application:

WITH filtered_orders AS (
    SELECT *
    FROM orders
    WHERE order_date = '2015-01-01'  -- Filtering in CTE before JOIN
)
SELECT filtered_orders.order_id,
       filtered_orders.order_date,
       pizza_type.name,
       pizzas.pizza_size
FROM filtered_orders  -- Joining with CTE
JOIN order_details
ON filtered_orders.order_id = order_details.order_id
JOIN pizzas;
 

Query History

  • Source: snowflake.account_usage.query_history
  • Provides different metrics such as execution time
SELECT query_text, start_time, end_time, execution_time
FROM
    snowflake.account_usage.query_history
WHERE query_text ILIKE '%order_details%'

Query Results

QUERY_TEXTSTART_TIMEEND_TIMEEXECUTION_TIME
SELECT * FROM order_details AS od JOIN pizzas AS p ON od.pizza_id = p.pizza_id2023-09-01 03:44:37.233 -07002023-09-01 03:44:38.309 -0700529
SELECT * FROM order_details AS od JOIN pizzas AS p;2023-09-01 03:43:37.899 -07002023-09-01 03:43:47.369 -07008,747
Note:
  • ILIKE: Case-insensitive string-matching operator