The following captures the original vision for Conductor and the problem space we’re tackling. Our first product is the use case around which we are building the platform described below.

1. Product


One-sentence description

Conductor is a data platform for the grueling challenges of vertical SaaS startups integrating data from non-tech industries (e.g., logistics, healthcare, food, and gov).

What we’re building and how it works

Conductor is a data platform for building and managing scalable custom data integrations (CDIs), explicitly designed for startups that must wrangle numerous fragile, bespoke data sources.

The startups with these data requirements are often those building for non-tech industries such as government, food, logistics, and healthcare (see examples). However, our team knows firsthand how the grueling challenges of working with the complex data from these industries can cripple product development.

Integrating data from non-tech industries is massively different from connecting to an API at Salesforce or Stripe. Instead, these startups must regularly shepherd the latest data from numerous sources, brittle formats, and disparate structures into a consistent internal schema. It’s a huge pain. We call this implementation a “custom data integration” (CDI). The following diagram shows where CDIs live within a modern data pipeline:

CDIs in a data pipeline (diagram forked)

CDIs in a data pipeline (diagram forked)

Problems:

We know from personal experience that existing data tools do not address the immense obstacles startups face when their product depends on dozens or hundreds of CDIs:

  1. Data onboarding requires months of writing complex transformations that wrangle source-specific data formats into a universal schema, only to inevitably break.
  2. Systems must support receiving data in various brittle input formats.
  3. Engineers must individually monitor and debug each source’s data and manually test each CDI.
  4. Engineers neglect writing tests and health checks because they must tediously rewrite them for each CDI.
  5. Modifying the universal internal schema requires individually updating each CDI.

<aside> 💀 These data challenges and more make the data ingestion layer the most fragile component of these startups’ systems despite being critical to their product functionality. In addition, building and managing these pipelines is so much work that they prevent companies from focusing on their main value proposition (e.g., ML, analytics).

</aside>

What we’re building: