Array functions in BigQuery are essential for working with arrays, which are collections of values that can be processed and analyzed as a unit. These functions allow you to manipulate and transform array data efficiently.
ARRAY_AGG()
- Explanation: The ARRAY_AGG() function aggregates values from multiple rows into a single array. This is useful for grouping related values or combining data into a collection.
- Usage:
- ARRAY_AGG(expression [ORDER BY expression] [LIMIT n]): Aggregates values into an array, with optional ordering and a limit.
- Example:
SELECT customer_id, ARRAY_AGG(order_id ORDER BY order_date DESC) AS orders_array
FROM orders
GROUP BY customer_id;
This query groups the order_id values for each customer_id into an array, ordered by order_date in descending order. For example, the result might return something like [123, 456, 789] for a customer with three orders.
- Practical Use:
This function is particularly useful for creating summary or combined views of data where you want to gather multiple values into a single field. For instance, gathering all the items in a customer’s shopping cart into an array:
SELECT customer_id, ARRAY_AGG(item_name) AS cart_items
FROM shopping_cart
GROUP BY customer_id;
ARRAY_CONCAT()
- Explanation: The ARRAY_CONCAT() function concatenates two or more arrays into one larger array. This is useful when you need to combine array data from multiple sources or fields.
- Usage:
- ARRAY_CONCAT(array1, array2, ..., arrayN): Concatenates the input arrays into one.
- Example:
SELECT ARRAY_CONCAT([1, 2, 3], [4, 5], [6, 7, 8]) AS concatenated_array;
This query concatenates three arrays [1, 2, 3], [4, 5], and [6, 7, 8] into one array: [1, 2, 3, 4, 5, 6, 7, 8].
- Practical Use:
This function is useful for combining multiple arrays into a single array for further analysis or processing. For instance, concatenating arrays of tags for different product categories:
SELECT product_id, ARRAY_CONCAT(tags_1, tags_2) AS all_tags
FROM products;
This combines two arrays of tags (tags_1 and tags_2) for each product.
ARRAY_LENGTH()
- Explanation: The ARRAY_LENGTH() function returns the number of elements in an array. It is useful when you need to check the size of an array or ensure that an array contains a certain number of elements.
- Usage:
- ARRAY_LENGTH(array): Returns the count of elements in the given array.
- Example:
SELECT ARRAY_LENGTH([10, 20, 30, 40]) AS array_size;
This query returns 4 because the array contains four elements.
- Practical Use:
This function is helpful when analyzing arrays or ensuring that they meet specific conditions. For instance, checking if a user’s shopping cart contains more than three items:
SELECT customer_id
FROM shopping_cart
WHERE ARRAY_LENGTH(items) > 3;
This query returns customers who have more than three items in their shopping cart.
UNNEST()
- Explanation: The UNNEST() function converts an array into a set of rows. Each element of the array becomes a separate row in the result set. This function is useful for breaking down arrays into individual components for further analysis or joining with other tables.
- Usage:
- UNNEST(array): Converts an array into rows.
- Example:
SELECT customer_id, item
FROM orders, UNNEST(items) AS item;
This query converts the items array in the orders table into individual rows. If a customer ordered 3 items, the query will return 3 rows for that customer, with each row representing one item.
- Practical Use:
The UNNEST() function is critical for transforming array data into a row format for deeper analysis. For instance, when you need to analyze individual items in a shopping cart or details of multiple items in a single order:
SELECT order_id, item, quantity
FROM orders, UNNEST(order_items) AS item, UNNEST(order_quantities) AS quantity;
This unpacks both the order_items and order_quantities arrays into individual rows, making it easier to analyze each item and its corresponding quantity in the order.
Summary of Use Cases
Array functions are essential for managing and manipulating arrays in BigQuery:
- ARRAY_AGG() is used to collect values into an array, which is helpful when summarizing data.
- ARRAY_CONCAT() allows you to merge arrays from different columns or sources.
- ARRAY_LENGTH() is used to check the size of an array, ensuring it meets certain conditions or constraints.
- UNNEST() breaks down arrays into individual rows for further analysis or joining with other tables.
These functions provide flexibility when working with structured or semi-structured data, enabling efficient data manipulation and transformation in BigQuery.