Hive SQL lineage • Spark • Data lakes

Fix messy data lineage.

Grease Monkey Labs builds practical tools for engineers who need to understand old Hive SQL, tangled data lake dependencies, and field-level lineage without weeks of manual detective work.

The problem

Your data lake knows. Nobody else does.

Years of Hive scripts, temporary tables, CTEs, external tables and forgotten jobs make impact analysis painful. One small field change can become a risky archaeology project.

Legacy SQL

Thousands of scripts, inconsistent naming, old patterns and hidden assumptions.

Missing lineage

Tables are easy-ish. Field-level lineage is where the pain usually starts.

Risky changes

Teams need confidence before changing schemas, feeds, fields or downstream products.

The product

Hive lineage tooling for real-world mess.

Built by a working data engineer for environments where the documentation is out of date, the SQL is complicated, and the answer is needed quickly.

  • Scan Hive SQL repositories
  • Extract table and field dependencies
  • Resolve common Hive variables and database context
  • Handle CTEs, aliases and nested query patterns
  • Export CSV outputs for review, search and audit
  • Create lineage views for impact analysis
lineage-output.csv
source_table,field_name,target_table,usage
raw.orders,customer_id,pdm.sales_fact,join
raw.orders,order_date,pdm.sales_fact,select
ext.web_data,campaign_id,pdm.sales_fact,lookup
pdm.sales_fact,customer_id,mart.customer_view,select
Who it helps

For teams dealing with old platforms and important data.

Data engineers

Find where a field comes from and what might break downstream.

Data architects

Map dependencies across Hive, Spark and lake-based processing.

Platform teams

Support migrations, audits, refactors and decommissioning work.

AWS Marketplace

Launching soon on AWS Marketplace.

Grease Monkey Labs lineage tools are being packaged for AWS Marketplace so teams can try them inside their existing cloud procurement process.

Contact

Got a horrible lineage problem?

Send over the shape of the problem: Hive, Spark, data lake, migration, impact analysis, field lineage, or anything held together with old SQL and hope.

Grease Monkey Labs

Email: hello@greasemonkeylabs.co.uk

Website: greasemonkeylabs.co.uk

Built in the UK for data engineers who would rather fix the machine than stare at another slide deck.