Skip to main content

Benchmarking

ObjectiveGoalsToolsModelsExpected OutputAgent OutputTestedDoesn’t workSomewhat WorksWorks Perfect
Writing BlogWrite a 1000-word blog with 2 images.Searx Toolkit, Read File, Write File, Stable Diffusion ToolkitGPT-3.5900-1000 word blog in a .txt file along with 2 images.350-word blog in a .txt file along with 2 imagesYesNoYesNo
TweetingDo Twitter Marketing by automatically creating 1 tweet every day about a specific theme.Twitter Toolkit, Stable Diffusion ToolkitGPT-4Tweet with ImageAgent tweeted successfully with a relevant imageYesNoYesNo
Posting InstagramDo Instagram Marketing by automatically creating 1 post every day about a specific theme.Instagram Toolkit, Stable Diffusion ToolkitGPT-4Instagram Post with Whey Muscle Protein Powder.Agent posted an image on Instagram with captions and image of generic whey protein from Stable DiffusionYesNoYesNo
Analytics reportCollect Weekly Analytics Report from Google Analytics and Mail it every Monday at 8 am.Google Analytics Toolkit, Email Toolkit, Write FileGPT-4.txt report with the collected data, which is mailed to me.To be testedNoNoNoNo
Sales prospectingCreate a list of sales managers’ contacts from specific companies in a CSV and send personalized mail.Tools not builtGPT-4Maintains a CSV of leads that tracks the mails sent.To be testedNoNoNoNo
Google Calendar AnalysisAnalyze calendar meeting invites for the day and provide an analysis via Email.Google Calendar Toolkit, Email Toolkit, Write File, Read FileGPT-4The agent should provide a rundown of meetings and time availability.The agent got stuck in a loop while making sense of eventsYesYesNoNo
Investment ResearchSend personalized mails to potential prospects.Tools not builtGPT-4To be testedNoNoNoNoNo
Email SummaryProvide a summary of unread mails every workday via Scheduled Agent Run and Recurring Run.Email Toolkit, Write File, Read FileGPT-4Mail to me or anyone with a summary of all unread emails.Agent was able to make a summary of at least 5 emails and send itYesYesYesNo
Email Automatic ReplyProvide automatic draft replies to all unread mails via Scheduled Agent Run and Recurring Run.Email Toolkit, Write File, Read FileGPT-4Generate a draft email with an appropriate reply for all recipients.Agent was able to save three emails as drafts with appropriate repliesYesYesYesNo
Automatic Jira Ticket CreationCreate a Jira ticket automatically when a GitHub Issue is created.Jira Toolkit, GitHub ToolkitGPT-4To generate a Jira Ticket with the description when a GitHub Issue is createdJira tickets are working, but not connected to GitHub ToolYesYesNoNo