Fluvio is a distributed, programmable streaming platform written in Rust.
Welcome to the 51st edition of this week in Fluvio.
For this edition we have more details on Community Contribution
Carson Rajcan presented to our team a really cool contribution to the Fluvio OSS.
In the project Carson developed the functionality to apply SmartModule transformation for Producer - PR #3014
Here is Carson’s experience…
Get the Stream Processing Unit to apply SmartModule transformations for the Producer requests coming from the CLI (before commit).
At present, Fluvio SmartModules are only applied for Consumer requests after being read from disk. Shaping the data before it enters the topic spares the Consumer or other downstream services from this burden.
While investigating another issue, I had recently gained some insight into how the Stream Processing Unit handles Producer requests, as well as how it uses the Fluvio Smart Engine to transform records for the Consumer.
Critically, I noticed that the Smart Engine was well encapsulated – it was going to be easy to repurpose.
Also that the Producer request handler was well organized, it wasn’t going to be a nightmare to plug in some additional functionality.
Where was I going to start? Well, TDD is my friend.
There were plenty of test cases for Consumer SmartModule transformations that used the CLI. All I had to do was move the SmartModule options from the Consumer commands over to the Producer commands.
In this example, we are processing email addresses on the Consumer side.
Our test email is
FooBar@test.com (emphasis on the usage of capital letters)
echo "FooBar@test.com" | fluvio produce emails fluvio consume emails -B -d --smartmodule lowercase
In this example, we are still processing email addresses, but now it is possible to do on the Producer side.
echo "FooBar@test.com" | fluvio produce emails --smartmodule lowercase fluvio consume emails -B -d
Both result in the same output with the email address using only lowercase letters:
With my TDD workflow ready to go, I made the changes from the outside-in.
Learning to translate between types in someone else’s codebase can be challenging.
There are many types to become familiar with in the Stream Processing Unit. Notably, the SPU uses a few different
Batch types to model using different Record types (Record, RawRecords, MemoryRecords).
You end up seeing
Batch<MemoryRecords> … quite often. It took me a while to figure out when and where to use each, and how to convert between them.
What happens when a Producer sends compressed records and requests the SPU performs a SmartModule Transformation?
The records must be decompressed so they can be fed to the SmartEngine, then compressed again before storage. To pull this off I had to dig up the code that performs the compression, decompression and figure out how to utilize it while handling Producer requests.
Thank you for your feedback on Discord. We are working on a public road map that should be out soon.
Keep the feedback flowing in our Discord channel and let us know if you’d like to see the video of Carson walking through the code.
For the full list of changes this week, be sure to check out our CHANGELOG.