BigQuery provides various functions to handle and manipulate JSON data. These functions allow you to query, extract, and analyze data from JSON objects, which are commonly used in structured and semi-structured data storage. JSON functions are useful when working with data from APIs, web logs, or complex datasets stored in JSON format.
JSON_EXTRACT()
- Explanation: The JSON_EXTRACT() function extracts a value or a JSON object from a JSON-encoded string. It returns the value at the specified JSON path, which could be a single object or a nested object.
- Usage:
- JSON_EXTRACT(json_string, json_path): Extracts the JSON object or value at the specified json_path from the json_string.
- Example:
SELECT JSON_EXTRACT('{"name": "John", "age": 30}', '$.name') AS extracted_name;
This query extracts the name field from the JSON object, resulting in "John".
SELECT JSON_EXTRACT('{"person": {"name": "John", "age": 30}}', '$.person') AS person_data;
This query extracts the entire person object from the JSON string, resulting in {"name": "John", "age": 30}.
- Practical Use:
JSON_EXTRACT() is useful when you need to extract a specific field or sub-object from a JSON column in a database. For example, extracting customer details from a JSON log:
SELECT JSON_EXTRACT(order_data, '$.customer') AS customer_info
FROM orders;
JSON_EXTRACT_SCALAR()
- Explanation: The JSON_EXTRACT_SCALAR() function extracts a scalar value (like a string or number) from a JSON-encoded string. Unlike JSON_EXTRACT(), which can return an entire JSON object, JSON_EXTRACT_SCALAR() only returns simple scalar values such as strings or numbers.
- Usage:
- JSON_EXTRACT_SCALAR(json_string, json_path): Extracts the scalar value (e.g., string, number) from the json_string at the specified json_path.
- Example:
SELECT JSON_EXTRACT_SCALAR('{"name": "John", "age": 30}', '$.name') AS name;
This query extracts the name value as a scalar, resulting in John (without the double quotes).
SELECT JSON_EXTRACT_SCALAR('{"person": {"name": "John", "age": 30}}', '$.person.age') AS age;
This query extracts the scalar value 30 for the age field inside the person object.
- Practical Use:
JSON_EXTRACT_SCALAR() is particularly useful when you want to retrieve specific values (e.g., strings or numbers) from a JSON object for further analysis or to use in calculations:
SELECT JSON_EXTRACT_SCALAR(order_data, '$.order_total') AS order_total
FROM orders;
This query extracts the order total as a scalar number, which can then be used for aggregations like SUM().
JSON_QUERY()
- Explanation: The JSON_QUERY() function extracts and returns JSON elements from a JSON-encoded string that match a specified JSON path. This function is useful for extracting arrays, nested objects, or other complex structures within the JSON.
- Usage:
- JSON_QUERY(json_string, json_path): Returns the JSON object or array from the json_string that matches the json_path.
- Example:
SELECT JSON_QUERY('{"name": "John", "address": {"city": "New York", "zip": "10001"}}', '$.address') AS address_info;
This query extracts the address object from the JSON string, resulting in {"city": "New York", "zip": "10001"}.
SELECT JSON_QUERY('{"employees": [{"name": "Alice"}, {"name": "Bob"}]}', '$.employees') AS employees_array;
This query extracts the employees array from the JSON object.
- Practical Use:
JSON_QUERY() is useful when you want to retrieve an entire JSON structure (like a nested object or array) from a JSON column. For example, extracting all line items from an order JSON object:
SELECT JSON_QUERY(order_data, '$.line_items') AS line_items
FROM orders;
Summary of Use Cases
JSON functions in BigQuery allow you to efficiently work with JSON data, which is commonly found in logs, API responses, and structured datasets:
- JSON_EXTRACT() allows you to extract specific fields or objects from JSON-encoded strings.
- JSON_EXTRACT_SCALAR() is used to retrieve scalar values (like numbers or strings) from JSON objects for calculations or further analysis.
- JSON_QUERY() extracts entire JSON structures, including arrays and nested objects, for more complex data manipulation.
These functions are critical when dealing with semi-structured data, helping you to navigate and extract meaningful information from JSON fields in BigQuery tables.