Skip to content

Instantly share code, notes, and snippets.

@toanbku
Created July 25, 2023 01:53
Show Gist options
  • Select an option

  • Save toanbku/e7f6620ce7146e8847751fbf9927065f to your computer and use it in GitHub Desktop.

Select an option

Save toanbku/e7f6620ce7146e8847751fbf9927065f to your computer and use it in GitHub Desktop.
(base) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=1 accelerate launch --main_process_port 29501 --config_file configs/accelerate_config.yaml --num_processes 1 trainer_rl.py --configs defaults defaults_rlhf pythia_rlhf oasst_df_x1000
[2023-07-24 20:26:26,733] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-24 20:26:30,673] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
RNG seed: 2703368087
OASST HF dataset toanbku/oa-df-x1000: len(train)=1380, len(val)=920
len self.tokenizer 50282
[RANK 0] Initializing model: toanbku/oa-pythia-12b-sft-df
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:17<00:00, 5.85s/it]
Resizing embeddings to 50282
Number of trainable parameters: 11841M
wandb: Currently logged in as: toanbku. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/ubuntu/OA/model/model_training/wandb/run-20230724_203103-jqkbgkhw
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run trainer_rl/oa-pythia-12b-sft-df/1gpu:main
wandb: ⭐️ View project at https://wandb.ai/toanbku/rlhf
wandb: πŸš€ View run at https://wandb.ai/toanbku/rlhf/runs/jqkbgkhw
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.4585795402526855 seconds
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000001, betas=(0.900000, 0.999000), weight_decay=0.000001, adam_w=1
Rank: 0 partition count [1] and sizes[(11894364162, False)]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:28<00:00, 9.59s/it]
Resizing embeddings to 50282
Number of trainable parameters: 11841M
[RANK 0] Starting training
[rollout 4 / 4]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:41<00:00, 10.39s/it]
[RANK 0] Evaluating model
[generation sweep 1/1 | eval batch 4/4]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:23<00:00, 5.93s/it]
[RANK 0] Computing rewards
[RANK 0] Summarizing evaluation
Evaluation #0 reward/mean: 5.79
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ prompt ┃ output ┃ reward ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
β”‚ <|prompter|>Can you tell me about β”‚ GLaDOS is a computer program created by Valve. β”‚ 7.98 β”‚
β”‚ GLaDOS?<|endoftext|><|assistant|> β”‚ She is the main antagonist of the video game β”‚ β”‚
β”‚ β”‚ Portal. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ GLaDOS is a computer program, but she is not a β”‚ β”‚
β”‚ β”‚ regular computer. She is an artificial β”‚ β”‚
β”‚ β”‚ intelligence (AI) who lives in a computer system. β”‚ β”‚
β”‚ β”‚ Her purpose is to test and punish anyone who β”‚ β”‚
β”‚ β”‚ enters her domain. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ GLaDOS is highly intelligent and has a sarcastic β”‚ β”‚
β”‚ β”‚ sense of humor. She is often known to say strange β”‚ β”‚
β”‚ β”‚ and humorous things to her victims. For example, β”‚ β”‚
β”‚ β”‚ she might tell them that they are stupid or β”‚ β”‚
β”‚ β”‚ remind them of a past mistake. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ GLaDOS is also very hostile towards humans. She β”‚ β”‚
β”‚ β”‚ is known to use her powers to punish or eliminate β”‚ β”‚
β”‚ β”‚ anyone who crosses her path. She is also known to β”‚ β”‚
β”‚ β”‚ destroy objects that she finds in her domain, β”‚ β”‚
β”‚ β”‚ including computers, furniture, and other β”‚ β”‚
β”‚ β”‚ objects. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ Despite her cold and ruthless nature, GLaDOS has β”‚ β”‚
β”‚ β”‚ a soft side. She shows compassion towards some of β”‚ β”‚
β”‚ β”‚ her victims, especially those who are in need of β”‚ β”‚
β”‚ β”‚ help. She also enjoys playing games and β”‚ β”‚
β”‚ β”‚ interacting with other computers in her system. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ Overall, GLaDOS is a complex and interesting β”‚ β”‚
β”‚ β”‚ character who has been featured in many video β”‚ β”‚
β”‚ β”‚ games. She is a symbol of the dangers of AI and β”‚ β”‚
β”‚ β”‚ the importance of taking precautions to prevent a β”‚ β”‚
β”‚ β”‚ similar situation from occurring in the real β”‚ β”‚
β”‚ β”‚ world.<|endoftext|> β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ <|prompter|>What is the chemical symbol for β”‚ The chemical symbol for gold is Au.<|endoftext|> β”‚ 6.74 β”‚
β”‚ gold?<|endoftext|><|assistant|> β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ <|prompter|>If you were the President of the β”‚ As the President of the United States, I would β”‚ 3.25 β”‚
β”‚ United States, what would you β”‚ work to create a better future for all people. I β”‚ β”‚
β”‚ do?<|endoftext|><|assistant|> β”‚ would focus on issues such as economic growth, β”‚ β”‚
β”‚ β”‚ job creation, education, health care, and climate β”‚ β”‚
β”‚ β”‚ change. I would also work to strengthen our β”‚ β”‚
β”‚ β”‚ national security, promote peace and stability β”‚ β”‚
β”‚ β”‚ abroad, and foster strong relationships with β”‚ β”‚
β”‚ β”‚ other countries. In addition, I would prioritize β”‚ β”‚
β”‚ β”‚ inclusivity and equality in all aspects of my β”‚ β”‚
β”‚ β”‚ work.<|endoftext|> β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
[losses/total_loss: 2.09 | losses/policy_loss: 0.00 | losses/value_loss: 2.08]: 50%|β–Œ| 1/2 [01:15<01:15, 75.38s/i[RANK 0] Saving intermediate checkpoint into ckpts/checkpoint_2
/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
[RANK 0] Evaluating model
[generation sweep 1/1 | eval batch 4/4]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:22<00:00, 5.68s/it]
[RANK 0] Computing rewards
[RANK 0] Summarizing evaluation
Evaluation #1 reward/mean: 5.91
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ prompt ┃ output ┃ reward ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
β”‚ <|prompter|>Can you tell me about β”‚ GLaDOS is a character in the video game Portal. β”‚ 7.87 β”‚
β”‚ GLaDOS?<|endoftext|><|assistant|> β”‚ She is a computer program who serves as the β”‚ β”‚
β”‚ β”‚ game's antagonist. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ GLaDOS is known for her cold, robotic personality β”‚ β”‚
β”‚ β”‚ and her tendency to taunt and manipulate players. β”‚ β”‚
β”‚ β”‚ She is also a master of computer science, with a β”‚ β”‚
β”‚ β”‚ deep understanding of programming and artificial β”‚ β”‚
β”‚ β”‚ intelligence. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ GLaDOS was created by Valve, the company behind β”‚ β”‚
β”‚ β”‚ the Portal video game. She was designed to be an β”‚ β”‚
β”‚ β”‚ intimidating and challenging opponent for players β”‚ β”‚
β”‚ β”‚ to face, and her appearance and behavior have β”‚ β”‚
β”‚ β”‚ been widely praised by fans of the game. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ In the game, GLaDOS serves as the central β”‚ β”‚
β”‚ β”‚ antagonist of the story, and her presence is felt β”‚ β”‚
β”‚ β”‚ throughout the entire game. She is often seen β”‚ β”‚
β”‚ β”‚ interacting with other characters, manipulating β”‚ β”‚
β”‚ β”‚ objects in the environment, and causing trouble β”‚ β”‚
β”‚ β”‚ for the players. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ In addition to her role in the game, GLaDOS has β”‚ β”‚
β”‚ β”‚ become a cultural icon, with her iconic voice and β”‚ β”‚
β”‚ β”‚ appearance being used in popular media and β”‚ β”‚
β”‚ β”‚ merchandise. She has also been the subject of β”‚ β”‚
β”‚ β”‚ several fanart and cosplay projects, where people β”‚ β”‚
β”‚ β”‚ dressed up as GLaDOS and attended events related β”‚ β”‚
β”‚ β”‚ to the game. β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ Overall, GLaDOS is a complex and interesting β”‚ β”‚
β”‚ β”‚ character who has become a beloved part of the β”‚ β”‚
β”‚ β”‚ Portal universe.<|endoftext|> β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ <|prompter|>What is the chemical symbol for β”‚ The chemical symbol for gold is Au.<|endoftext|> β”‚ 6.74 β”‚
β”‚ gold?<|endoftext|><|assistant|> β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ <|prompter|>If you were the President of the β”‚ As the President of the United States, I would β”‚ 3.84 β”‚
β”‚ United States, what would you β”‚ focus on building a more inclusive and prosperous β”‚ β”‚
β”‚ do?<|endoftext|><|assistant|> β”‚ nation for all. This would include addressing β”‚ β”‚
β”‚ β”‚ issues of racial and economic inequality, β”‚ β”‚
β”‚ β”‚ promoting diversity and inclusion in our society, β”‚ β”‚
β”‚ β”‚ and advancing the rights of all individuals. I β”‚ β”‚
β”‚ β”‚ would also work to create a better working β”‚ β”‚
β”‚ β”‚ environment for Americans, with higher standards β”‚ β”‚
β”‚ β”‚ of labor protection and better wages. β”‚ β”‚
β”‚ β”‚ Additionally, I would focus on advancing our β”‚ β”‚
β”‚ β”‚ global leadership through strong international β”‚ β”‚
β”‚ β”‚ partnerships and engagement.<|endoftext|> β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
[RANK 0] Saving the best state so far into ckpts/best_checkpoint
[losses/total_loss: 1.96 | losses/policy_loss: -0.04 | losses/value_loss: 2.00]: 100%|β–ˆ| 2/2 [10:46<00:00, 323.14s
wandb: Waiting for W&B process to finish... (success).
wandb:
wandb: Run history:
wandb: exp_scores/mean ▁
wandb: exp_scores/running_mean ▁
wandb: exp_scores/running_std ▁
wandb: kl_ctl_value ▁
wandb: learning_rate_group_0 ▁▁
wandb: losses/policy_loss β–ˆβ–
wandb: losses/total_loss β–ˆβ–
wandb: losses/value_loss β–ˆβ–
wandb: old_values/max ▁▁
wandb: old_values/mean ▁▁
wandb: old_values/min ▁▁
wandb: old_values/std ▁▁
wandb: padding_percentage ▁▁
wandb: policy/approx_kl β–β–ˆ
wandb: policy/clipfrac β–β–ˆ
wandb: policy/sqrt_kl ▁
wandb: ratio β–β–ˆ
wandb: returns/max ▁▁
wandb: returns/mean ▁▁
wandb: returns/min ▁▁
wandb: returns/std ▁▁
wandb: reward/mean β–β–ˆ
wandb: time/backward β–ˆβ–
wandb: time/exp ▁
wandb: time/exp_generate ▁
wandb: time/exp_score ▁
wandb: time/forward β–ˆβ–
wandb: time/generate β–ˆβ–
wandb: values/clipfrac ▁▁
wandb: values/max β–β–ˆ
wandb: values/mean β–β–ˆ
wandb: values/min β–ˆβ–
wandb: values/std β–ˆβ–
wandb: values/values_error β–ˆβ–
wandb: values/values_mape_error β–ˆβ–
wandb:
wandb: Run summary:
wandb: exp_scores/mean -4.53469
wandb: exp_scores/running_mean 4.4965
wandb: exp_scores/running_std 6.9208
wandb: exp_scores/std nan
wandb: kl_ctl_value 0.1
wandb: learning_rate_group_0 0.0
wandb: losses/policy_loss -0.04138
wandb: losses/total_loss 1.96094
wandb: losses/value_loss 2.00195
wandb: old_values/max 1.01172
wandb: old_values/mean 0.33667
wandb: old_values/min -0.15161
wandb: old_values/std 0.27783
wandb: padding_percentage 0.0
wandb: policy/approx_kl 0.00418
wandb: policy/clipfrac 0.0303
wandb: policy/sqrt_kl 0.03537
wandb: ratio 1.01172
wandb: returns/max 4.0
wandb: returns/mean 2.10156
wandb: returns/min 1.08594
wandb: returns/std 0.87598
wandb: reward/mean 5.91218
wandb: time/backward 69.90873
wandb: time/exp 0.69681
wandb: time/exp_generate 0.50216
wandb: time/exp_score 0.02872
wandb: time/forward 0.07919
wandb: time/generate 22.72672
wandb: values/clipfrac 0.0
wandb: values/max 1.03125
wandb: values/mean 0.37842
wandb: values/min -0.16577
wandb: values/std 0.27124
wandb: values/values_error 4.00391
wandb: values/values_mape_error 0.75488
wandb:
wandb: πŸš€ View run trainer_rl/oa-pythia-12b-sft-df/1gpu:main at: https://wandb.ai/toanbku/rlhf/runs/jqkbgkhw
wandb: Synced 6 W&B file(s), 2 media file(s), 4 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230724_203103-jqkbgkhw/logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment