@cuberick/codellama-7b-hf

Quickstart

flwr new @cuberick/codellama-7b-hf

Readme

Model Details

Our method is based on bigcode/starcoder2-7b.

How to Get Started with the Model

First, set up the enviroment following the main README.md file.

Then use the code below to get started with the model.

flwr run .

Training Details

Training Data

We train with the default supplied data as:

dataset.name = "flwrlabs/code-alpaca-20k"

Training Hyperparameters

model.name = "codellama/CodeLlama-7b-hf"
model.quantization = 4
model.gradient-checkpointing = true
model.lora.peft-lora-r = 8 
model.lora.peft-lora-alpha = 16 
train.save-every-round = 5
train.learning-rate-max = 5e-5
train.learning-rate-min = 1e-6
train.seq-length = 512
train.training-arguments.output-dir = ""
train.training-arguments.learning-rate = ""
train.training-arguments.per-device-train-batch-size = 16 # 16
train.training-arguments.gradient-accumulation-steps = 1
train.training-arguments.logging-steps = 10
train.training-arguments.num-train-epochs = 3
train.training-arguments.max-steps = 10
train.training-arguments.save-steps = 1000
train.training-arguments.save-total-limit = 10
train.training-arguments.gradient-checkpointing = true
train.training-arguments.lr-scheduler-type = "constant"
strategy.fraction-fit = 0.2
strategy.fraction-evaluate = 0.0
num-server-rounds = 100

Communication Cost

25606 MB

Evaluation

Download the checkpoints at this link.

Procedures

See this for downloading the necessary packages and eval script. Below, we provide the commands to run the evaluations on each metric respectively.

For codellama/CodeLlama-7b-hf results:

# humaneval
python main.py \
--model=codellama/CodeLlama-7b-hf \
--peft_model=path_to_the_model/peft_50  \
--max_length_generation=1024  \
--batch_size=4 \
--use_auth_token \
--allow_code_execution \
--save_generations  \
--save_references \
--tasks=humaneval \
--metric_output_path=./CodeLlama-7b-hf/evaluation_results_humaneval.json

# mbpp
python main.py \
--model=codellama/CodeLlama-7b-hf \
--peft_model=path_to_the_model/peft_50  \
--max_length_generation=2048  \
--batch_size=4 \
--use_auth_token \
--allow_code_execution \
--save_generations  \
--save_references \
--tasks=mbpp \
--metric_output_path=./CodeLlama-7b-hf/evaluation_results_mbpp.json

# multiple-js
python main.py \
--model=codellama/CodeLlama-7b-hf \
--peft_model=path_to_the_model/peft_50  \
--max_length_generation=1024  \
--batch_size=4 \
--use_auth_token \
--allow_code_execution \
--save_generations  \
--save_references \
--tasks=multiple-js \
--metric_output_path=./CodeLlama-7b-hf/evaluation_results_multiple_js.json

# multiple-cpp
python main.py \
--model=codellama/CodeLlama-7b-hf \
--peft_model=path_to_the_model/peft_50  \
--max_length_generation=1024  \
--batch_size=4 \
--use_auth_token \
--allow_code_execution \
--save_generations  \
--save_references \
--tasks=multiple-cpp \
--metric_output_path=./CodeLlama-7b-hf/evaluation_results_multiple_cpp.json

Results

Average: 33.78

MBPP: 38.26

HumanEval: 31.71

MultiPL-E (JS): 33.54

MultiPL-E (C++): 31.68

Framework versions

PEFT 0.14.0