Quickstart
This tutorial will introduce you to about 90% of the concepts you'll need on a daily basis.
- How to run a query
- How to define a data package with a live connection to a data source
- How to publish your package for others in their preferred runtime language
- How to install with a package manager & import into a code base
- How to write and run advanced analytics queries and single row lookups
- How to update your data package
Install the dpm Client
If you'd prefer, you can find all of our binaries for download on our GitHub page.
- Mac
- Windows
Download and install dpm
brew tap patch-tech/tap
brew install patch-tech/tap/dpm
Download and install binaries
- Download the dpm CLI binary from GitHub, select the file ending in
pc-windows-gnu.zip
- Rename the executable
rename dpm-0.0.1-x86_64-pc-windows-msvc.exe dpm.exe # in your downloads folder
- In the "Start Menu" search for "Edit environment variables for your account"
- Select "Path", then click "Edit"
- Click "New" and add the path to the folder where you saved the executable
- Click "OK" and close the window
Log into dpm using the CLI
dpm login # in a new terminal window and follow the prompts to log in with GitHub
Run your first query
It's time to query data! Follow along in your preferred runtime language.
- Node.js
- Python
Create a new project
mkdir demo-project
cd demo-project
npm init -y
npm install node-tsRun the following to build demo package, it will install in a folder /dist in the current working directory.
dpm build-package -p "Snowflake Demo Package (fast)@0.1.0" nodejs # you can use the -o flag to specify a different output directory
Install the demo package. If you used the -o flag in step 2, be sure to use that pathname here.
npm install ./dist/nodejs/snowflake-demo-package-fast-0.1.0-0.2.0.tgz
Create a file to run the query. You're welcome to use your favorite editor, but here's a quick way to do it from the command line:
cat > first_query.ts # At the prompt paste the following snippet and Ctrl-D to save
Paste the following code into your terminal. Be sure to Ctrl-D to save.
import { FactsAppEngagement as FactsAppEngagementSnow } from 'snowflake-demo-package-fast';
// Get avg time in app and user counts
// broken down by app and day of week
async function main() {
let { appTitle, foregroundduration, panelistid, starttimestamp } = FactsAppEngagementSnow.fields;
let query = FactsAppEngagementSnow.select(
appTitle.as("App_Name"),
foregroundduration.avg().as("Avg_Time_in_App"),
panelistid.countDistinct().as("User_Count"),
starttimestamp.day.as("Day_of_week")
)
await query.compile().then((data)=> console.log("Compiled query: ", data));
await query.execute().then((data)=> console.log(data));
}
main().catch(console.error);Run the code!
npx ts-node first_query.ts
Windows users may need to use py
instead of python3
in the following commands. Users of Python 2.x
may need to use python
instead of python3
.
Create a Python project.
mkdir demo-project
cd demo-project
python3 -m venv .venv
source .venv/bin/activateRun the following to build demo package, it will install in a folder /dist in the current working directory.
dpm build-package -p "Snowflake Demo Package (fast)@0.1.0" python # you can use the -o flag to specify a different output directory
Install the demo package. Be sure to use the same output directory from step 1.
python -m pip install ./dist/python/snowflake-demo-package-fast@0.1.0.0.2.0
Create a file to run the query. You're welcome to use your favorite editor, but here's a quick way to do it from the command line:
cat > first_query.py # At the prompt paste the snippet and Ctrl-D to save
Paste the following code into your terminal. Be sure to Ctrl-D to save.
import asyncio
from pprint import pprint
from snowflake_demo_package_fast import FactsAppEngagement as FactsAppEngagement
# Get avg time in app and user counts
# broken down by app and day of week
async def query():
[app_title, foregroundduration, panelistid, starttimestamp] = [
FactsAppEngagement.fields.app_title,
FactsAppEngagement.fields.foregroundduration,
FactsAppEngagement.fields.panelistid,
FactsAppEngagement.fields.starttimestamp
]
query = FactsAppEngagement.select(
app_title.with_alias("App_Name"),
foregroundduration.avg().with_alias("Average_Time_in_App"),
panelistid.count_distinct().with_alias("User_Count"),
starttimestamp.day.with_alias("Day_of_Week")
).limit(10)
compiled_query = await query.compile()
results = await query.execute()
print(f"Compiled query:\n{compiled_query}")
print(f"Results:\n")
pprint(results)
asyncio.run(query())Run the script!
python first_query.py
When you're done running queries, deactivate your virtual environment.
deactivate
Create and query your own package
To get started, you will
- Connect to a data source
- Define a data package
- Build the package
- Publish the package
Create a source
dpm currently supports one source: Snowflake. Others wil be supported soon.
- Navigate to a directory where you'd like to consume your Snowflake data. If you'd like reuse the demo project, then navigate to that directory.
cd ~/dpm-demo
- Run the following to create a Snowflake source. You will need to replace the
<>
with your own values. Trydpm source create snowflake --help
for more information.dpm source create snowflake --name <> --organization <> --account <> --database <> --user <> --password <>
Check out the docs on creating sources for more details.
Create your first package
- Run the following to generate a descriptor file, called
datapackage.json
. Provide the source name from the previous step and give your package a name as well.# replace PACKAGE_NAME a display name for your package,
# SOURCE_NAME with the source name from the previous step, and
# TABLE_NAME with the name of a table in your source (you can pass the option multiple times)
dpm init --package-name PACKAGE_NAME SOURCE_NAME snowflake --table TABLE_NAME - Publish the data package to dpm. This will make it reviewable on the Packages screen.
dpm publish # in the directory with the `datapackage.json` file
- Then build the client library from the
datapackage.json
file. You can find the version of the package in thedatapackage.json
file.dpm build-package -p PACKAGE_NAME@version nodejs # run with `python` for a python client library
This will write the built artifact to ./dist/{target}/{versioned-package}/
by default, using the version value in your descriptor. You may override the output directory with the --out-dir <path>
option on dpm build-package
.
Import and run queries
- Share the library locally and import it into your project.
npm install ./dist/nodejs/YOUR_PACKAGE_NAME-1.0.0.tgz # you can python -m pip install ./dist/python/YOUR_PACKAGE_NAME-1.0.0.tar.gz for python
- Then, from a module in same directory as the above command.
import {Table1, Table2, Table3 } from './YOUR_PACKAGE_NAME'// In your project's source code
- Write and run some queries, using the template above if you need.
See here for further details on sharing your package.
Update your package
Over time, you'll need to update your package. For example, you may need to evolve the schemas in your source backend.
The following command refreshes the contents of a descriptor. This leverages build information stashed in the datapackage.json
file to recall the parameters that originally generated the descriptor: patch --dataset <dataset_name>
or snowflake --table <table_name>...
in the above examples. This also applies an appropriate version bump according to the semantic versioning system.
dpm update ./datapackage.json
The command shows a diff and prompts (y/n)
by default; skip with -y
. Then, it writes the new descriptor to $PWD/datapackage.json
by default; override with -o <path>
.
From there, you can proceed with the same build workflow as before.