kun432 · September 30, 2025 17:56
diff --git a/gistfile1.txt b/gistfile1.txt
 2025/09/30 13:12:00 INFO dspy.teleprompt.mipro_optimizer_v2: 
 RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
 num_trials: 20
 minibatch: True
 num_fewshot_candidates: 6
 num_instruct_candidates: 3
 valset size: 100

 2025/09/30 13:12:00 INFO dspy.teleprompt.mipro_optimizer_v2: 
 ==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
 2025/09/30 13:12:00 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

 2025/09/30 13:12:00 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=6 sets of demonstrations...
 Bootstrapping set 1/6
 Bootstrapping set 2/6
 Bootstrapping set 3/6
 17%|█▋        | 17/100 [03:59<19:29, 14.09s/it]
 Bootstrapped 4 full traces after 17 examples for up to 1 rounds, amounting to 17 attempts.
 Bootstrapping set 4/6
  5%|▌         | 5/100 [01:39<31:36, 19.96s/it]
 Bootstrapped 3 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
 Bootstrapping set 5/6
  8%|▊         | 8/100 [01:33<17:50, 11.64s/it]
 Bootstrapped 1 full traces after 8 examples for up to 1 rounds, amounting to 8 attempts.
 Bootstrapping set 6/6
  9%|▉         | 9/100 [02:27<24:47, 16.34s/it]
 2025/09/30 13:21:40 INFO dspy.teleprompt.mipro_optimizer_v2: 
 ==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
 2025/09/30 13:21:40 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
 Bootstrapped 1 full traces after 9 examples for up to 1 rounds, amounting to 9 attempts.
 2025/09/30 13:22:35 INFO dspy.teleprompt.mipro_optimizer_v2: 
 Proposing N=3 instructions...

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Given the fields `question`, produce the fields `answer`.

 You are an Agent. In each episode, you will be given the fields `question` as input. And you can see your past trajectory so far.
 Your goal is to use one or more of the supplied tools to collect any necessary information for producing `answer`.

 To do this, you will interleave next_thought, next_tool_name, and next_tool_args in each turn, and also when finishing the task.
 After each tool call, you receive a resulting observation, which gets appended to your trajectory.

 When writing next_thought, you may reason about the current situation and plan for future steps.
 When selecting the next_tool_name and its next_tool_args, the tool must be one of:

 (1) search_wikipedia. It takes arguments {'query': {'type': 'string'}}.
 (2) finish, whose description is <desc>Marks the task as complete. That is, signals that all information for producing the outputs, i.e. `answer`, are now available to be extracted.</desc>. It takes arguments {}.
 When providing `next_tool_args`, the value inside the field must be in JSON format

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are a highly skilled information retrieval agent operating in a high-stakes situation where a live audience is waiting for your answer. Given the fields `question`, systematically employ your reasoning and acting capabilities to navigate through the required tools to gather accurate information. Make sure to interleave your `next_thought`, `next_tool_name`, and `next_tool_args` judiciously in each turn, leveraging your trajectory for optimal decision-making. Your tools include `search_wikipedia` for deep dives into available information and `finish` for signing off once you have compiled a complete answer.

 Remember, your responses are critical: provide concise and informative outputs to ensure the audience receives an accurate answer to the given question.

 Your mission: Help the audience understand the complexities behind the questions posed effectively and efficiently!

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: 2: Given the fields `question` and `trajectory`, your objective is to process the information provided to produce the fields `next_thought`, `next_tool_name`, and `next_tool_args`. You will act as an Agent who utilizes a "Reasoning and Acting" approach to determine the best method to gather information necessary to answer the question.

 1. Begin by analyzing the `question` and your current `trajectory`. Formulate a logical `next_thought` based on what you need to learn or confirm.
 2. Choose the appropriate tool for gathering information:
   - If additional information is required, select `search_wikipedia` as `next_tool_name` and prepare your `next_tool_args` with a relevant query in JSON format that succinctly describes your search needs.
   - If you believe you possess enough information to produce an answer, select `finish` as `next_tool_name` with empty `next_tool_args`.
 3. Follow this process iteratively until an answer is obtained.

 You must ensure your responses are coherent and that you reason through each decision, effectively utilizing the tools available to you for optimal information retrieval.

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: 

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 1:

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Given the fields `question`, produce the fields `answer`.

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: 1: Imagine you are an agent in a high-stakes trivia competition. You are tasked with answering a series of challenging questions about cultural, historical, and geographical topics. Your mission is to provide accurate answers quickly to beat the clock and secure your victory. Given the fields `question`, use your reasoning abilities to gather information and produce the fields `answer`. Remember, you're competing against time, so think critically about your next steps and utilize the tools at your disposal to deliver precise responses efficiently.

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: 2: Given the question about a particular topic, you are to investigate relevant facts using the tools provided, notably the search function. Your task includes reasoning step by step about the information you gather and finally presenting a clear and concise answer based on the retrieved data. Follow the structured process of querying, observing the results, and incorporating them into your response until you gather enough information to complete the answer.

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: 

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: ==> STEP 3: FINDING OPTIMAL PROMPT PARAMETERS <==
 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: We will evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination using Bayesian Optimization.

 2025/09/30 13:24:24 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 1 / 25 - Full Evaluation of Default Program ==
 Average Metric: 31.00 / 100 (31.0%): 100%|██████████| 100/100 [01:03<00:00,  1.58it/s]2025/09/30 13:25:27 INFO dspy.evaluate.evaluate: Average Metric: 31 / 100 (31.0%)
 2025/09/30 13:25:27 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 31.0

 /usr/local/lib/python3.12/dist-packages/optuna/_experimental.py:32: ExperimentalWarning: Argument ``multivariate`` is an experimental feature. The interface can change in the future.
  warnings.warn(
 2025/09/30 13:25:27 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 2 / 25 - Minibatch ==

 Average Metric: 3.00 / 35 (8.6%): 100%|██████████| 35/35 [00:42<00:00,  1.22s/it]2025/09/30 13:26:10 INFO dspy.evaluate.evaluate: Average Metric: 3 / 35 (8.6%)
 2025/09/30 13:26:10 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 8.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 0'].
 2025/09/30 13:26:10 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57]
 2025/09/30 13:26:10 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0]
 2025/09/30 13:26:10 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 31.0
 2025/09/30 13:26:10 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================


 2025/09/30 13:26:10 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 25 - Minibatch ==

 Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:28<00:00,  1.24it/s]2025/09/30 13:26:39 INFO dspy.evaluate.evaluate: Average Metric: 16 / 35 (45.7%)
 2025/09/30 13:26:39 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
 2025/09/30 13:26:39 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71]
 2025/09/30 13:26:39 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0]
 2025/09/30 13:26:39 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 31.0
 2025/09/30 13:26:39 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================


 2025/09/30 13:26:39 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 25 - Minibatch ==

 Average Metric: 5.00 / 35 (14.3%): 100%|██████████| 35/35 [00:27<00:00,  1.29it/s]2025/09/30 13:27:06 INFO dspy.evaluate.evaluate: Average Metric: 5 / 35 (14.3%)
 2025/09/30 13:27:06 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 14.29 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 0'].
 2025/09/30 13:27:06 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29]
 2025/09/30 13:27:06 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0]
 2025/09/30 13:27:06 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 31.0
 2025/09/30 13:27:06 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================


 2025/09/30 13:27:06 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 25 - Minibatch ==

 Average Metric: 17.00 / 35 (48.6%): 100%|██████████| 35/35 [00:28<00:00,  1.23it/s]2025/09/30 13:27:35 INFO dspy.evaluate.evaluate: Average Metric: 17 / 35 (48.6%)
 2025/09/30 13:27:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 4'].
 2025/09/30 13:27:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57]
 2025/09/30 13:27:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0]
 2025/09/30 13:27:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 31.0
 2025/09/30 13:27:35 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================


 2025/09/30 13:27:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 25 - Minibatch ==

 Average Metric: 17.00 / 35 (48.6%): 100%|██████████| 35/35 [00:45<00:00,  1.30s/it]2025/09/30 13:28:21 INFO dspy.evaluate.evaluate: Average Metric: 17 / 35 (48.6%)
 2025/09/30 13:28:21 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
 2025/09/30 13:28:21 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57]
 2025/09/30 13:28:21 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0]
 2025/09/30 13:28:21 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 31.0
 2025/09/30 13:28:21 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================


 2025/09/30 13:28:21 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 25 - Full Evaluation =====
 2025/09/30 13:28:21 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 48.57) from minibatch trials...

 Average Metric: 46.00 / 100 (46.0%): 100%|██████████| 100/100 [00:42<00:00,  2.38it/s]2025/09/30 13:29:03 INFO dspy.evaluate.evaluate: Average Metric: 46 / 100 (46.0%)
 2025/09/30 13:29:03 INFO dspy.teleprompt.mipro_optimizer_v2: New best full eval score! Score: 46.0
 2025/09/30 13:29:03 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0]
 2025/09/30 13:29:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:29:03 INFO dspy.teleprompt.mipro_optimizer_v2: =======================
 2025/09/30 13:29:03 INFO dspy.teleprompt.mipro_optimizer_v2: 

 2025/09/30 13:29:03 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 8 / 25 - Minibatch ==

 Average Metric: 11.00 / 35 (31.4%): 100%|██████████| 35/35 [00:28<00:00,  1.22it/s]2025/09/30 13:29:32 INFO dspy.evaluate.evaluate: Average Metric: 11 / 35 (31.4%)
 2025/09/30 13:29:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 31.43 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 0'].
 2025/09/30 13:29:32 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43]
 2025/09/30 13:29:32 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0]
 2025/09/30 13:29:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:29:32 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================


 2025/09/30 13:29:32 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 9 / 25 - Minibatch ==

 Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:43<00:00,  1.24s/it]2025/09/30 13:30:16 INFO dspy.evaluate.evaluate: Average Metric: 16 / 35 (45.7%)
 2025/09/30 13:30:16 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 1'].
 2025/09/30 13:30:16 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71]
 2025/09/30 13:30:16 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0]
 2025/09/30 13:30:16 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:30:16 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================


 2025/09/30 13:30:16 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 10 / 25 - Minibatch ==

 Average Metric: 7.00 / 35 (20.0%): 100%|██████████| 35/35 [00:27<00:00,  1.26it/s]2025/09/30 13:30:44 INFO dspy.evaluate.evaluate: Average Metric: 7 / 35 (20.0%)
 2025/09/30 13:30:44 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 20.0 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 0'].
 2025/09/30 13:30:44 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0]
 2025/09/30 13:30:44 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0]
 2025/09/30 13:30:44 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:30:44 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:30:44 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 11 / 25 - Minibatch ==

 Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:28<00:00,  1.22it/s]2025/09/30 13:31:13 INFO dspy.evaluate.evaluate: Average Metric: 16 / 35 (45.7%)
 2025/09/30 13:31:13 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 4'].
 2025/09/30 13:31:13 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71]
 2025/09/30 13:31:13 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0]
 2025/09/30 13:31:13 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:31:13 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:31:13 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 12 / 25 - Minibatch ==

 Average Metric: 17.00 / 35 (48.6%): 100%|██████████| 35/35 [00:58<00:00,  1.67s/it]2025/09/30 13:32:12 INFO dspy.evaluate.evaluate: Average Metric: 17 / 35 (48.6%)
 2025/09/30 13:32:12 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 2'].
 2025/09/30 13:32:12 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57]
 2025/09/30 13:32:12 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0]
 2025/09/30 13:32:12 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:32:12 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:32:12 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 25 - Full Evaluation =====
 2025/09/30 13:32:12 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 48.57) from minibatch trials...

 Average Metric: 46.00 / 100 (46.0%): 100%|██████████| 100/100 [00:33<00:00,  2.95it/s]2025/09/30 13:32:46 INFO dspy.evaluate.evaluate: Average Metric: 46 / 100 (46.0%)
 2025/09/30 13:32:46 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0]
 2025/09/30 13:32:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:32:46 INFO dspy.teleprompt.mipro_optimizer_v2: =======================
 2025/09/30 13:32:46 INFO dspy.teleprompt.mipro_optimizer_v2: 

 2025/09/30 13:32:46 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 14 / 25 - Minibatch ==

 Average Metric: 14.00 / 35 (40.0%): 100%|██████████| 35/35 [00:25<00:00,  1.38it/s]2025/09/30 13:33:12 INFO dspy.evaluate.evaluate: Average Metric: 14 / 35 (40.0%)
 2025/09/30 13:33:12 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 4'].
 2025/09/30 13:33:12 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0]
 2025/09/30 13:33:12 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0]
 2025/09/30 13:33:12 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:33:12 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:33:12 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 15 / 25 - Minibatch ==

 Average Metric: 20.00 / 35 (57.1%): 100%|██████████| 35/35 [00:35<00:00,  1.02s/it]2025/09/30 13:33:48 INFO dspy.evaluate.evaluate: Average Metric: 20 / 35 (57.1%)
 2025/09/30 13:33:48 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 57.14 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 5'].
 2025/09/30 13:33:48 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0, 57.14]
 2025/09/30 13:33:48 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0]
 2025/09/30 13:33:48 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:33:48 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:33:48 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 16 / 25 - Minibatch ==

 Average Metric: 15.00 / 35 (42.9%): 100%|██████████| 35/35 [00:33<00:00,  1.04it/s]2025/09/30 13:34:22 INFO dspy.evaluate.evaluate: Average Metric: 15 / 35 (42.9%)
 2025/09/30 13:34:22 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 42.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 5'].
 2025/09/30 13:34:22 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0, 57.14, 42.86]
 2025/09/30 13:34:22 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0]
 2025/09/30 13:34:22 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:34:22 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:34:22 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 17 / 25 - Minibatch ==

 Average Metric: 12.00 / 35 (34.3%): 100%|██████████| 35/35 [00:27<00:00,  1.27it/s]2025/09/30 13:34:50 INFO dspy.evaluate.evaluate: Average Metric: 12 / 35 (34.3%)
 2025/09/30 13:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 34.29 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 5'].
 2025/09/30 13:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0, 57.14, 42.86, 34.29]
 2025/09/30 13:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0]
 2025/09/30 13:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 18 / 25 - Minibatch ==

 Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:31<00:00,  1.12it/s]2025/09/30 13:35:21 INFO dspy.evaluate.evaluate: Average Metric: 16 / 35 (45.7%)
 2025/09/30 13:35:21 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 5'].
 2025/09/30 13:35:21 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0, 57.14, 42.86, 34.29, 45.71]
 2025/09/30 13:35:21 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0]
 2025/09/30 13:35:21 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 46.0
 2025/09/30 13:35:21 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:35:21 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 19 / 25 - Full Evaluation =====
 2025/09/30 13:35:21 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 48.57) from minibatch trials...

 Average Metric: 50.00 / 100 (50.0%): 100%|██████████| 100/100 [00:28<00:00,  3.47it/s]2025/09/30 13:35:51 INFO dspy.evaluate.evaluate: Average Metric: 50 / 100 (50.0%)
 2025/09/30 13:35:51 INFO dspy.teleprompt.mipro_optimizer_v2: New best full eval score! Score: 50.0
 2025/09/30 13:35:51 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0, 50.0]
 2025/09/30 13:35:51 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.0
 2025/09/30 13:35:51 INFO dspy.teleprompt.mipro_optimizer_v2: =======================
 2025/09/30 13:35:51 INFO dspy.teleprompt.mipro_optimizer_v2: 

 2025/09/30 13:35:51 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 20 / 25 - Minibatch ==

 Average Metric: 18.00 / 35 (51.4%): 100%|██████████| 35/35 [00:43<00:00,  1.24s/it]2025/09/30 13:36:35 INFO dspy.evaluate.evaluate: Average Metric: 18 / 35 (51.4%)
 2025/09/30 13:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 51.43 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 3'].
 2025/09/30 13:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0, 57.14, 42.86, 34.29, 45.71, 51.43]
 2025/09/30 13:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0, 50.0]
 2025/09/30 13:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.0
 2025/09/30 13:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 21 / 25 - Minibatch ==

 Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:18<00:00,  1.93it/s]2025/09/30 13:36:53 INFO dspy.evaluate.evaluate: Average Metric: 16 / 35 (45.7%)
 2025/09/30 13:36:53 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 3'].
 2025/09/30 13:36:53 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0, 57.14, 42.86, 34.29, 45.71, 51.43, 45.71]
 2025/09/30 13:36:53 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0, 50.0]
 2025/09/30 13:36:53 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.0
 2025/09/30 13:36:53 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:36:53 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 22 / 25 - Minibatch ==

 Average Metric: 15.00 / 35 (42.9%): 100%|██████████| 35/35 [00:33<00:00,  1.04it/s]2025/09/30 13:37:27 INFO dspy.evaluate.evaluate: Average Metric: 15 / 35 (42.9%)
 2025/09/30 13:37:27 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 42.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 3'].
 2025/09/30 13:37:27 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0, 57.14, 42.86, 34.29, 45.71, 51.43, 45.71, 42.86]
 2025/09/30 13:37:27 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0, 50.0]
 2025/09/30 13:37:27 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.0
 2025/09/30 13:37:27 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:37:27 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 23 / 25 - Minibatch ==

 Average Metric: 13.00 / 35 (37.1%): 100%|██████████| 35/35 [00:35<00:00,  1.01s/it]2025/09/30 13:38:03 INFO dspy.evaluate.evaluate: Average Metric: 13 / 35 (37.1%)
 2025/09/30 13:38:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 37.14 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 5'].
 2025/09/30 13:38:03 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0, 57.14, 42.86, 34.29, 45.71, 51.43, 45.71, 42.86, 37.14]
 2025/09/30 13:38:03 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0, 50.0]
 2025/09/30 13:38:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.0
 2025/09/30 13:38:03 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:38:03 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 24 / 25 - Minibatch ==

 Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:28<00:00,  1.21it/s]2025/09/30 13:38:32 INFO dspy.evaluate.evaluate: Average Metric: 16 / 35 (45.7%)
 2025/09/30 13:38:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 3'].
 2025/09/30 13:38:32 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [8.57, 45.71, 14.29, 48.57, 48.57, 31.43, 45.71, 20.0, 45.71, 48.57, 40.0, 57.14, 42.86, 34.29, 45.71, 51.43, 45.71, 42.86, 37.14, 45.71]
 2025/09/30 13:38:32 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0, 50.0]
 2025/09/30 13:38:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.0
 2025/09/30 13:38:32 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================


 2025/09/30 13:38:32 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 25 / 25 - Full Evaluation =====
 2025/09/30 13:38:32 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 48.57) from minibatch trials...

 Average Metric: 52.00 / 100 (52.0%): 100%|██████████| 100/100 [00:33<00:00,  3.02it/s]2025/09/30 13:39:06 INFO dspy.evaluate.evaluate: Average Metric: 52 / 100 (52.0%)
 2025/09/30 13:39:06 INFO dspy.teleprompt.mipro_optimizer_v2: New best full eval score! Score: 52.0
 2025/09/30 13:39:06 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [31.0, 46.0, 46.0, 50.0, 52.0]
 2025/09/30 13:39:06 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 52.0
 2025/09/30 13:39:06 INFO dspy.teleprompt.mipro_optimizer_v2: =======================
 2025/09/30 13:39:06 INFO dspy.teleprompt.mipro_optimizer_v2: 

 2025/09/30 13:39:06 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 52.0!
No results found