# LLM Finetuning using AutoTrain Advanced

In this notebook, we will finetune a llama-3.2-1b-instruct model using AutoTrain Advanced.
You can replace the model with any Hugging Face transformers compatible model and dataset with any other dataset in proper formatting.
For dataset formatting, please take a look at [docs](https://huggingface.co/docs/autotrain/index).

In [2]:
from autotrain.params import LLMTrainingParams
from autotrain.project import AutoTrainProject

In [None]:
HF_USERNAME = "your_huggingface_username"
HF_TOKEN = "your_huggingface_write_token" # get it from https://huggingface.co/settings/token
# It is recommended to use secrets or environment variables to store your HF_TOKEN
# your token is required if push_to_hub is set to True or if you are accessing a gated model/dataset

In [None]:
params = LLMTrainingParams(
    model="meta-llama/Llama-3.2-1B-Instruct",
    data_path="HuggingFaceH4/no_robots", # path to the dataset on huggingface hub
    chat_template="tokenizer", # using the chat template defined in the model's tokenizer
    text_column="messages", # the column in the dataset that contains the text
    train_split="train",
    trainer="sft", # using the SFT trainer, choose from sft, default, orpo, dpo and reward
    epochs=3,
    batch_size=1,
    lr=1e-5,
    peft=True, # training LoRA using PEFT
    quantization="int4", # using int4 quantization
    target_modules="all-linear",
    padding="right",
    optimizer="paged_adamw_8bit",
    scheduler="cosine",
    gradient_accumulation=8,
    mixed_precision="bf16",
    merge_adapter=True,
    project_name="autotrain-llama32-1b-finetune",
    log="tensorboard",
    push_to_hub=True,
    username=HF_USERNAME,
    token=HF_TOKEN,
)

If your dataset is in CSV / JSONL format (JSONL is most preferred) and is stored locally, make the following changes to `params`:

```python
params = LLMTrainingParams(
    data_path="data/", # this is the path to folder where train.jsonl/train.csv is located
    text_column="text", # this is the column name in the CSV/JSONL file which contains the text
    train_split = "train" # this is the filename without extension
    .
    .
    .
)
```

In [None]:
# this will train the model locally
project = AutoTrainProject(params=params, backend="local", process=True)
project.create()