.. _tutorials-monitor-your-experiments: :octicon:`codescan-checkmark` Monitor Your Experiments ====================================================== .. dropdown:: What you will learn :icon: multi-select :animate: fade-in * How to monitor your experiments using Tensorboard * How to monitor your experiments using WanDB .. dropdown:: Prerequisites :icon: multi-select :animate: fade-in * Get familiar with fairseq2 basics (:ref:`basics-overview`) * Ensure you have fairseq2 installed (:ref:`installation`) TensorBoard ----------- .. image:: /_static/img/tutorials/end_to_end_fine_tuning/tutorial_example_trace.png :width: 580px :align: center :alt: TensorBoard fairseq2 saves checkpoints and tensorboard events to the defined ``$OUTPUT_DIR``, which allows you to investigate into the details in your jobs. .. code-block:: bash # run tensorboard at your ckpt path tensorboard --logdir $CHECKPOINT_PATH # example tensorboard --logdir /checkpoint/$USER/outputs/ps_llama3_1_instruct.ws_16.a73dad52/tb/train If you ran your experiment on your server, you probably need to port forward the tensorboard service to your local machine: .. code-block:: bash ssh -L 6006:localhost:6006 $USER@$SERVER_NAME Then you can view the tensorboard service in your browser `http://localhost:6006 `__. WanDB ------ .. image:: /_static/img/tutorials/end_to_end_fine_tuning/tutorial_example_wandb.png :width: 580px :align: center :alt: WandB fairseq2 natively support WanDB (Weights & Biases) - a powerful tool for monitoring and managing machine learning experiments. WanDB provides a centralized platform to track, compare, and analyze the performance of different models, making it easier to identify trends, optimize hyperparameters, and reproduce results. Follow the `quick start guide `__ to initialize it in your environment. What you need to do is simply add the following line in your config YAML file: .. code-block:: yaml common: metric_recorders: wandb: _set_: enabled: true project: run: Then run your recipe with ``fairseq2 ... --config-file .yaml``. Then you can open up your WanDB Portal and check the results in real-time. .. dropdown:: A step-by-step example :icon: code :animate: fade-in .. code-block:: bash ENV_NAME=... # YOUR_ENV_NAME CONFIG_FILE=... # YOUR_CONFIG_FILE OUTPUT_DIR=... # YOUR_OUTPUT_DIR conda activate $ENV_NAME # install wandb pip install wandb # initialize wandb, copy paste your token when prompted wandb login --host=... # your wandb hostname # now you are good to go fairseq2 lm instruction_finetune $OUTPUT_DIR \ --config-file $CONFIG_FILE \ # cleanup conda deactivate