Expense journal with Cloudflare AI

Expense journal with Cloudflare AI

This is a submission for the Cloudflare AI Challenge.

What I Built

This project is a Cloudflare serverless worker with an AI service binding deployed for processing either audio, image or text submitted by users into semantic useful information, especially in the context for submitting expenses.

For users that prefer having a UI to interact with, a simple NextJS web app is created to demonstrate how to upload files and utilizes API routes to make a POST request to the Cloudflare worker endpoint.

How it works

Models used in this project:

Image-to-text: @cf/unum/uform-gen2-qwen-500m

Automatic Speech Recognition: @cf/openai/whisper

Text Generation: @hf/thebloke/mistral-7b-instruct-v0.1-awq

Demo

Try out the worker!

The worker is deployed at https://cf-journal.senchatea.workers.dev.

Text input

curl –location ‘https://cf-journal.senchatea.workers.dev?type=text’
–header ‘Content-Type: text/plain’
–data ‘2 weeks ago, I went to eat a buffet at Swensens Unlimited at the T2 airport, it’‘s really nice but it costs like $36 per person after GST, and there were 2 of us.’

Audio input

curl –location ‘https://cf-journal.senchatea.workers.dev?type=audio’
–header ‘Content-Type: application/octet-stream’
–header ‘Authorization: Bearer pYrzMvsyURxsCUeaQDsa3lSO_tBDQEuiPB3iLEQt’
–data ‘@postman-cloud:///1eef4b5b-a64c-4c10-ad4b-5f22c528880b’

Image input

curl –location ‘https://cf-journal.senchatea.workers.dev?type=image’
–header ‘Content-Type: application/octet-stream’
–header ‘Authorization: Bearer pYrzMvsyURxsCUeaQDsa3lSO_tBDQEuiPB3iLEQt’
–data ‘@postman-cloud:///1eef4b68-e5d7-49e0-a28d-f60b3bd7d70e’

Try it in the web app!

https://cf-journal.vercel.app/

My Code

The project has 2 folders:

serverless: Code that is deployed to a Cloudflare worker. You can find the text generation prompt & code to use the AI models at utils.ts. The Cloudflare worker request handler is found at main.ts

app: Nextjs app code that is deployed to Vercel

Introduction

This project is a Cloudflare serverless worker with an AI service binding deployed for processing either audio, image or text submitted by users into semantic useful information, especially in the context for submitting expenses.

For users that prefer a UI, a simple NextJS web app is created to demonstrate how to upload files and utilizes API routes to make a POST request to the Cloudflare worker endpoint.

How it works

Models used

From Cloudflare AI Models:

Image-to-text: @cf/unum/uform-gen2-qwen-500m

Automatic Speech Recognition: @cf/openai/whisper

Text Generation: @hf/thebloke/mistral-7b-instruct-v0.1-awq

Try out the worker!

The worker is deployed at https://cf-journal.senchatea.workers.dev.

Text input

curl –location https://cf-journal.senchatea.workers.dev?type=text
–header Content-Type: text/plain
–data 2 weeks ago, I went to eat a buffet at Swensens Unlimited at the T2 airport, its really nice but it costs like $36 per person after GST, and there were

Journey

How the idea came about

Before OpenAI introduced ChatGPT and revitalized the AI scene, I once created Billy, a cute a little expense tracker app to help my mom to track her expenses easier. It actually received good reception with my friends both online & offline because of the clean and straight forward UI. However, the user retention wasn’t very good because the act of logging expenses is very tedious because of the multitude of fields to fill. It was very difficult to encourage the habit of logging expenses for the users.

For many users, it is much direct and easier to talk or take a picture with a phone as compared to typing. But it was rather difficult to interpret information from the data from these input types back then, so I dropped that project eventually.

Now, by utilizing different AI models, we can interpret and organize important information for them more easily, simplifying the expense submission process.

Room for improvements

I’m very new to AI so there will definitely a lot of things to improve on!

The webapp in this submission is a very early MVP of what I envision the actual expense tracker app to be.

For now, it only create expenses based on the information in the file input (audio/mp3/text).
Ideally after submission, you can still modify the individual fields of the expense form. The AI portion is just there to assist the auto-filling process.

I’m also not very familiar with prompt engineering, but I would also like to improve the text generation prompt in the future.

Resources

These are some helpful resources from Cloudflare that I have used to work on this project since I wasn’t familiar with the AI models available for text generation/inference.

Workers AI LLM Playground
Guide on choosing the right
text generation model

My project is eligible for the Triple Task type category.

Leave a Reply

Your email address will not be published. Required fields are marked *