Skip to main content

Introduction to dpm

Welcome to the dpm documentation!

What is dpm?

dpm stands for data package manager.

Using the dpm CLI or dpm web application, you can define, build, and publish data packages: code packages with a live, embedded connection to a data source. Users can upgrade data packages for lower latency, more reliability, stricter access controls, schema evolution, and time travel.

Why data packages?

As products scale, database workloads evolve. As organizations scale, teams often need to leverage other teams' data to build their products. As a result, engineering teams often fall into one of two traps.

Sub-optimal use of existing componentsComplex web of pipelines, streams, and databases
Stretching Postgres to the maxCopying data from one team's database to another's
Using MongoDB for analyticsIntroducing specialized databases for each incremental use case
Querying a data warehouse directly from productionHeavy coordination through schema changes & migrations

Data packages are designed as a replacement for low-level data engineering work.

What is a data package?

Data packages enable engineers to safely query data, no matter where it is stored.

Data packages include:

  1. Declarative query interfaces with type-safety in the developer’s preferred runtime language, such as Python or TypeScript
  2. Embedded access policies for federated governance
  3. Change management through a familiar package versioning workflow
  4. Performance & consistency configurations for highly reliable & low latency apps
  5. Metadata, notably a version, maintainer, and description of intended usage and constraints

Data packages replace:

  1. Pipelines into operational & online analytics stores
  2. Caches and/or read-replicas
  3. API & SDK development

Features & use cases

  • Securely distribute and import data products
  • Query & enrich data from any source like a data micro-service
  • Take analytical workloads off operational databases with no infrastructure setup
  • Build apps & services with generated, type-safe query interfaces derived from a dataset schema
  • Query data immediately without waiting on direct database access
  • Perform time series bucketing, aggregations, grouping, filtering and sorting without writing complicated SQL queries
  • Run analytical queries over large datasets with low latency and without hitting the underlying storage system
  • Use data package versions to safely update your schemas without impacting downstream consumers
  • Leverage data from Snowflake, BigQuery, or Databricks in customer-facing applications
  • Look up single row records with single digit millisecond response times
  • Turn your dbt models into data packages in minutes

How do data packages work?

Data producers define a data package by selecting tables from a data source. Then, a client package is generated from the tables' schemas, with configurable query interfaces in popular runtimes like Python and TypeScript.

The generated package can be published to registries like npm and PyPI, so consumers can install them using a familiar npm or pip workflow. They can also safely upgrade as the schema or other properties of the data package are updated.

The package is imported as a library dependency into a code project. The client enables users to write queries with type safety and helper functions for common date functions, aggregates, filters, and lookups. The query is routed through an agent process, which translates the query into a source-appropriate dialect.

import { FactsAppEngagement as FactsAppEngagementSnow } from 'snowflake-demo-package';

// Get avg time in app and user counts
// broken down by app and day of week
async function main() {
let { appTitle, foregroundduration, panelistid, starttimestamp } = FactsAppEngagementSnow.fields;

let query = FactsAppEngagementSnow.select(
appTitle.as("App Name"),
foregroundduration.avg().as("Avg Time in App"),
panelistid.countDistinct().as("User Count"),
starttimestamp.day.as("Day of week")
)

query.compile().then((data)=> console.log("Compiled query: ", data));
query.execute().then((data)=> console.log(data));
}

main().catch(console.error);

Learn more

To stay up to date with dpm, be sure to follow @patch_data and @dpminstall on Twitter/X!

If you have questions about anything related to dpm, you're welcome to ask on GitHub Discussions.