azure / azure-llm-fine-tuning

Intro

File size measurements show the distribution of size of files.
Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

File Size Overall

Legend:

1001+

501-1000

201-500

101-200

1-100

File Size per Extension

1001+

501-1000

201-500

101-200

1-100

File Size per Logical Decomposition

primary

1001+

501-1000

201-500

101-200

1-100

Longest Files (Top 22)

File	# lines	# units
1_training_mlflow_phi3.ipynb in phi3	866	-
1_training_custom_phi3.ipynb in phi3	863	-
2_serving_florence2.ipynb in florence2-VQA	811	-
1_training_mlflow_florence2.ipynb in florence2-VQA	640	-
3_optimization_olive.ipynb in phi3	572	-
aoai_finetune.ipynb in aoai	384	-
phi3.py in phi3/olive	260	5
token_count_utils.py in aoai	244	12
train_mlflow.py in phi3/src_train	208	5
train_mlflow.py in florence2-VQA/src_train	206	10
train.py in phi3/src_train	204	4
1_data-preparation-basic.ipynb in phi3/dataset-preparation	189	-
score.py in florence2-VQA/src_serve	57	3
train_tokenizer.py in phi3/dataset-preparation	51	2
combine_tokenizer.py in phi3/dataset-preparation	39	2
score.py in phi3/src_serve	28	2
conda.yaml in phi3/olive	25	-
logger.py in aoai	12	-
logger.py in phi3	12	-
logger.py in florence2-VQA	12	-
jsonl training_set.jsonl in aoai/dataset	10	-
jsonl validation_set.jsonl in aoai/dataset	10	-

Files With Most Units (Top 9)

File	# lines	# units
token_count_utils.py in aoai	244	12
train_mlflow.py in florence2-VQA/src_train	206	10
train_mlflow.py in phi3/src_train	208	5
phi3.py in phi3/olive	260	5
train.py in phi3/src_train	204	4
score.py in florence2-VQA/src_serve	57	3
combine_tokenizer.py in phi3/dataset-preparation	39	2
train_tokenizer.py in phi3/dataset-preparation	51	2
score.py in phi3/src_serve	28	2

Files With Long Lines (Top 16)

There are 16 files with lines longer than 120 characters. In total, there are 173 long lines.

File	# lines	# units	# long lines
aoai_finetune.ipynb in aoai	384	-	28
1_training_mlflow_phi3.ipynb in phi3	866	-	25
1_training_custom_phi3.ipynb in phi3	863	-	25
1_training_mlflow_florence2.ipynb in florence2-VQA	640	-	21
3_optimization_olive.ipynb in phi3	572	-	14
2_serving_florence2.ipynb in florence2-VQA	811	-	12
jsonl training_set.jsonl in aoai/dataset	10	-	10
jsonl validation_set.jsonl in aoai/dataset	10	-	10
train_mlflow.py in florence2-VQA/src_train	206	10	10
train.py in phi3/src_train	204	4	5
train_mlflow.py in phi3/src_train	208	5	4
1_data-preparation-basic.ipynb in phi3/dataset-preparation	189	-	3
token_count_utils.py in aoai	244	12	2
combine_tokenizer.py in phi3/dataset-preparation	39	2	2
score.py in phi3/src_serve	28	2	1
score.py in florence2-VQA/src_serve	57	3	1

Correlations

File Size vs. Commits (all time): 22 points

		866.0	lines of code min: 10.0 average: 259.23 25th percentile: 21.75 median: 196.5 75th percentile: 431.0 max: 866.0
0	4.0
commits (all time) min: 1.0 \| average: 2.05 \| 25th percentile: 1.0 \| median: 2.0 \| 75th percentile: 3.0 \| max: 4.0

		866.0	lines of code min: 10.0 average: 259.23 25th percentile: 21.75 median: 196.5 75th percentile: 431.0 max: 866.0
0	1.0
contributors (all time) min: 1.0 \| average: 1.0 \| 25th percentile: 1.0 \| median: 1.0 \| 75th percentile: 1.0 \| max: 1.0

No data for "commits (30d)" vs. "lines of code".

No data for "contributors (30d)" vs. "lines of code".

		866.0	lines of code min: 28.0 average: 471.44 25th percentile: 127.5 median: 572.0 75th percentile: 837.0 max: 866.0
0	2.0
commits (90d) min: 1.0 \| average: 1.56 \| 25th percentile: 1.0 \| median: 2.0 \| 75th percentile: 2.0 \| max: 2.0

		866.0	lines of code min: 28.0 average: 471.44 25th percentile: 127.5 median: 572.0 75th percentile: 837.0 max: 866.0
0	1.0
contributors (90d) min: 1.0 \| average: 1.0 \| 25th percentile: 1.0 \| median: 1.0 \| 75th percentile: 1.0 \| max: 1.0