The challenge
- Raw clickstream funnel events (views, clicks, reserves) arrived as deeply nested JSON at large scale.
- Stakeholders needed reliable engagement analytics split by device and rate type (e.g. high-value vs. cheapest).
- Travel-product representations needed consistent, documented data models across analyses.
Our approach
- Built PySpark jobs that flatten nested funnel events and aggregate engagement at week / month / year grain.
- Orchestrated the flow with Apache Oozie — a coordinator gating on input data-availability flags, running the jobs on a weekly cadence.
- Produced analytics tables for cheapest-rate CTR, high-value engagement, room-rate engagement, and offer positioning.
- Defined and documented standardized data models for travel-product offerings consumed downstream.
- Designed outputs as idempotent full rebuilds for reproducibility, feeding a Tableau dashboard.
Architecture
Funnel events
- Nested JSON
- Views · clicks · reserves
Oozie coordinator
- Done-flag gating
PySpark jobs
- Flatten & aggregate
- Week / month / year
Analytics tables
- Idempotent rebuild
- Desktop vs. mobile
Dashboard
- Tableau
Outcomes
- A reliable, reproducible insights pipeline powering the customer-engagement dashboard.
- Consistent engagement metrics across desktop and mobile web.
- Documented, reusable data models standardizing travel-product representation.
