Getting Started with Hatched IQ

 
Our CEO, Donal, breaks down our methodology and report structure, using a simple Excel example to highlight key components, and shows how this data can be accessed in Snowflake and S3.
Article Contents:

Data Methodology

Hatched Analytics specialises in identifying and collecting proprietary sequential ID (order ID) data - unique identifiers that reflect the volume of orders or transactions processed through a company’s technology stack.
When an online purchase is made, a numeric or alphanumeric identifier is often generated. These identifiers typically form part of a sequence. By collecting sufficient volumes of these signals, Hatched can infer transaction/order volumes processed by the platform.
Sequential IDs may be identified across multiple locations, including:
  • Email receipts
  • URLs or clickstreams
  • JSON page elements
  • Delivered packages

Data Source

To scale this methodology, Hatched operates a cashback and rewards application. This provides access to a community of more than 20,000 gig workers who share transactional data in exchange for rewards, with full transparency on how the data is used.
This community is not a panel. Instead, these users generate the IDs that allow us to observe real order counts flowing through company platforms on a weekly, monthly, and quarterly basis. Importantly, no sampling is applied. The indices are built directly from the collected signals.
Data is sourced at scale from our community of more than 20,000 gig workers.
Data is sourced at scale from our community of more than 20,000 gig workers.
 

Report Structure

This section provides an overview of the core dataset components available in Snowflake and S3 for backtesting our transaction indices. It covers the LIVE and PIT structures, core files per ticker, and how to interpret columns. Files are structured the same in Snowflake and S3.
 
You’ll see three top-level components:
  1. LIVE – Latest issued report for each of our data feeds.
  1. PIT (Point-in-Time) – Dated report snapshots for reproducible backtests.
  1. SUPPLEMENTAL – Reference tables (index/KPI descriptions, mappings, quality scores).
 

LIVE Files

These are the most recent reports for each data feed in your subscription. Each delivery includes three LIVE files per ticker, which are replaced with new versions at the next delivery. LIVE file structure is the same across all reports. Those files are:
  1. TS_INDEX – Unmodeled Data
  1. TS_RESULTS – Modelled KPI Estimates & Outcomes
  1. MODEL_DETAILS – Model Spec & Diagnostics
 
  1. TS_INDEX
The TS Index is the foundational dataset in Snowflake/S3. It contains the raw, unmodelled values generated from sequential IDs.
The TS_INDEX tables use a consistent format across all reports. Each data feed contains one or more indices, depending on how the IDs are generated:
  • Order/ Transaction Index – aggregate order volumes; the baseline index for most tickers. Where only a global sequential ID is available, this may be the sole index.
  • Geo Index – geographic breakdowns, if IDs are generated at that level.
  • Brand Index – brand-level breakdowns, if IDs are generated at that level.
  • F(x)-Weighted Index - weighted order/transaction indices based on f(x) fluctuations, where applicable.
TS_INDEX layout in Snowflake.
TS_INDEX layout in Snowflake.
 
  1. TS_RESULTS
The TS Results contain the modelled outputs using the index/indices to estimate the company’s KPI/KPIs, plus reported outcomes and error measurement for backtesting.
TS_RESULTS in Snowflake.
TS_RESULTS in Snowflake.
 
  1. MODELS
The Models tables document how each index is mapped to its KPI and include the model parameters. Their format is consistent across all reports.
MODELS in Snowflake.
MODELS in Snowflake.
 

PIT Files

The PIT files provide dated archives of the LIVE files for reproducible, point-in-time backtests.
  • Each PIT file name includes a date that corresponds to the last value contained in that snapshot.
  • Use PIT when you need to guarantee that your backtest only sees the data state as of that specific date.
  • Best practice: If you’re measuring “what did we know then?”, pull from PIT (or filter LIVE by RELEASEDATE ≤ your test cut-off).
 

SUPPLEMENTAL Files

These files help you interpret data feeds and align them to external systems.
  • DESCRIPTIONS_KPI
  • MAPPINGS
    • Cross-refs to external identifiers (e.g., Bloomberg, Visible Alpha) and KPI naming to enable downstream publishing.