name: bigquery-historical-data-aggregator description: Aggregates and analyzes historical data from multiple BigQuery tables with similar schemas. Queries multiple tables using UNION ALL, calculates aggregate metrics (averages, sums, counts), handles table discovery via INFORMATION_SCHEMA, and processes large datasets efficiently with batch queries.

Instructions

Discover Tables: First, query INFORMATION_SCHEMA.TABLES to identify all available tables within the target dataset. This ensures the skill adapts to the actual table names present.
Aggregate Historical Data: Construct a SQL query that uses UNION ALL to combine data from all identified tables. Calculate the required aggregate metric (e.g., AVG(score)) grouped by the relevant key (e.g., student_id, name).
Handle Large Results: For datasets returning many rows (>50), use a batched querying strategy (e.g., LIMIT and OFFSET or filtering by key ranges) to retrieve the complete result set without truncation.
Join with Latest Data: Read the latest data from the provided source (e.g., a local CSV file). Perform a join between the aggregated historical data and the latest data to enable comparative analysis.
Calculate Deltas & Filter: Compute the percentage change or difference between historical and latest values. Apply the user-specified threshold filter (e.g., drop_percentage > 0.25).
Output Results: Write the filtered results to the specified output file (e.g., bad_student.csv).
Trigger Critical Actions: For records exceeding a higher, critical threshold (e.g., drop_percentage > 0.45), execute immediate actions such as writing critical log entries to a designated logging service.

Dynamic Table Inclusion: Use the list from INFORMATION_SCHEMA to build the UNION ALL query dynamically. Do not hardcode table names.
Efficient Batch Retrieval: When the final aggregated list or intermediate results are large, retrieve data in manageable chunks using WHERE clauses on sequential keys or LIMIT/OFFSET.
Precise Percentage Calculation: Ensure the percentage change formula is correct: (historical_value - latest_value) / historical_value.
Logging for Notification: When writing critical logs, include all necessary identifiers (e.g., name, ID) and context so downstream systems can trigger alerts or notifications.

Confirm the target dataset exists before querying.
Verify that source files (e.g., CSV) exist and are readable.
Validate that the log bucket or destination for critical alerts exists and is accessible.

scripts/aggregate_query_template.sql: A parameterized SQL template for the core aggregation logic.
references/schema_example.md: An example schema to illustrate the expected table structure.