Version: latest

SDF Quickstart

Provisioning and operating a Stateful Dataflow is simple and only requires two prerequisites:

A Fluvio Cluster to enable dataflows to consume and produce streaming data.
A Dataflow File to define how a dataflow sources, transforms, and emits data.

You can build, test, and run Stateful Dataflows locally, then deploy them to your InfinyOn Cloud cluster.

In-line Dataflows

Dataflows can be defined in dataflow.yaml files. When prototyping, a dataflow.yaml can be composed of in-line code, making it the single asset required to define a dataflow.

Deploying an in-line dataflow is simple:

Download (or create) a dataflow file
Run the dataflow

While in-line dataflows are a breeze to get started with, maintaining code in YAML is not always ideal. For complex projects, we recommend using Composable Dataflows.

Create and Run a Dataflow

Let's create a simple in-line dataflow which receives a sentence, splits it into words, and outputs the length of each word.

1. Installing the SDF CLI

Dataflows are managed via the SDF CLI that we install using fvm.

fvm install sdf-beta11

2. Create the Dataflow file

Create the dataflow file in a directory named word-length:

$ mkdir -p word-length
$ cd word-length
$ touch dataflow.yaml

Add the following content to the dataflow.yaml:

apiVersion: 0.5.0

meta:
  name: word-length
  version: 0.1.0
  namespace: example

config:
  converter: raw

topics:
  sentences:
    schema:
      value:
        type: string
        converter: raw
  words:
    schema:
      value:
        type: string
        converter: raw

services:
  calc-word-length:
    sources:
      - type: topic
        id: sentences

    transforms:
      - operator: flat-map
        run: |
          fn sentence_to_words(sentence: String) -> Result<Vec<String>> {
            Ok(sentence.split_whitespace().map(String::from).collect())
          }
      - operator: map
        run: |
          pub fn word_length(word: String) -> Result<String> {
            Ok(format!("{}({})", word, word.chars().count()))
          }

    sinks:
      - type: topic
        id: words

This dataflow.yaml first declares a version for the dataflow configuration structure. It then defines a default record converter, "raw" instead of "json" in this case. Next, it lists two Fluvio Topics which the dataflow expects to be present, with an expected record schema. SDF will create these Topics if they do not already exist. Finally, the config defines a Service which will read from a source topic and write to a sink topic. The service uses two Operators, in this case defined in-line in Rust, to perform transformations on the data.

3. Run the Dataflow

Use the sdf CLI to run the dataflow. This will start a REPL which we can use to communicate with the dataflow.

$ sdf run --ui

Note: When passed to sdf run, the --ui flag will start a local webserver allowing you to view the graphical representation of the dataflow on SDF Studio.

4. Test the Dataflow

First, let's use Fluvio to consume from the words topic so we can see the output of the dataflow in real time:

$ fluvio consume words

Then use Fluvio to produce sentences to the sentences topic:

$ fluvio produce sentences

Enter the following strings into the producer REPL:

Hello world
Hi there

You should see the following output in the consumer stream:

Hello(5)
world(5)
Hi(2)
there(5)

5. Show State

Stateful Dataflows are capable of maintaining states (data values) in durable storage, which like a database, will persist when an SDF session ends. You can define arbitrary state values which can be accessed and updated in your dataflow. SDF also maintains some built-in state values which keep track of the dataflow's status.

To view the default metrics for the sentence-to-words operator, use the show state command:

>> show state calc-word-length/sentence-to-words/metrics
 Key    Window  succeeded  failed
 stats  *       2          0

View the metrics for the word-length operator:

>> show state calc-word-length/word-length/metrics
 Key    Window  succeeded  failed
 stats  *       4          0

Congratulations! You've successfully built and run a dataflow! Many more examples are available on Github.

6. Clean-up

Exit the sdf terminal and then remove the topics we created:

sdf clean --force

Note: The --force option should only be used if you want to remove everything, including the topics created by this dataflow.

In-line Dataflows

Create and Run a Dataflow​

1. Installing the SDF CLI​

2. Create the Dataflow file​

3. Run the Dataflow​

4. Test the Dataflow​

5. Show State​

6. Clean-up​