Quickstart - Plum AI Docs

Datasets

To get started, Plum AI needs to start ingesting a dataset of inputs to your LLM and outputs from your LLM. This dataset is used to evaluate your application, which will drive the improvements you’ll make. There are multiple ways to upload data to Plum:

Upload a file containing the data
Use the Python SDK: pip install plum-sdk
Use our API. See the API reference here.

Here’s how to upload a JSON file directly to Plum: Upload a dataset

Evaluation workflow

Generate evaluation metrics

Based on the uploaded data and system prompt, Plum AI can get you started quantitatively evaluating your outputs. To get started, click on “Generate Evaluation Metrics” for a set of metrics tailored for your specific use case. Generate metrics

You can edit these metrics to modify them or add your own. Show metrics

Use generated metrics

Next, you can run an evaluation on your dataset using these metrics. This will give you a quantitative understanding of how well your LLM is performing based on the data provided. Run evaluations

Click “Run Evaluation” and Plum AI will provide a statistical analysis of your outputs within seconds. Evaluation results

You now have a snapshot of your LLM application’s performance, which you can track over time. Not only that, but Plum AI also provides you with reasons why and how your LLM is underperforming on specific metrics. This allows you to iterate and improve particular aspects of your LLM performance over time.

Fine-tuning workflow

Generate synthetic data driven by the evaluation scores

Now that you have evaluations tailored to your preferences, you may want to fine-tune your LLM. Providers such as OpenAI and Anthropic allow you to fine-tune models based on a dataset of positive examples you upload to their platform. Plum AI can leverage your evaluation results and provide you with the exact right data to fine-tune a model. Choose the size of synthetic dataset you want to generate from your initial seed dataset. For reference, fine-tuning a model requires around at least 100 examples. Click the “Generate” button to generate synthetic data based on evaluation scores. Augment

Click on “Download in OpenAI’s .jsonl format” to download the synthetic dataset in the format required by OpenAI’s platform. Download jsonl

Upload the synthetic data to a major LLM provider like OpenAI’s fine-tuning API

Once you have a train.jsonl from Step 5, you could optionally create another file, validation.jsonl, using real heldout data that you haven’t used in the seed dataset.

Go to the OpenAI fine-tuning page: https://platform.openai.com/finetune
Click “Create”.
Upload new training data.

After around 15 minutes, the fine-tuning run completes, and OpenAI will provide a customized model ID that you can start using. Congratulations! You’ve completed one round of fine-tuning using Plum AI. Unlock your data flywheel: generate a new set of data using your fine-tuned model, create synthetic data using Plum AI, and start another round of fine-tuning.

Get Started

API

​Sign up for an account

​Datasets

​Evaluation workflow

​Generate evaluation metrics

​Use generated metrics

​Fine-tuning workflow

​Generate synthetic data driven by the evaluation scores

​Upload the synthetic data to a major LLM provider like OpenAI’s fine-tuning API

Sign up for an account