huggingface / data-is-better-together
File Size

The distribution of size of files (measured in lines of code).

Intro
Learn more...
File Size Overall
48% | 6% | 22% | 13% | 8%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: grouped by folders | grouped by size | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
ipynb82% | 0% | 15% | 0% | 2%
yaml0% | 100% | 0% | 0% | 0%
py0% | 0% | 39% | 41% | 19%
html0% | 0% | 0% | 0% | 100%
in0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
community-efforts57% | 0% | 32% | 6% | 4%
cookbook-efforts36% | 15% | 7% | 25% | 14%
Longest Files (Top 34)
File# lines# units
02_upload_prompt_translation_data.ipynb
in community-efforts/prompt_translation
1458 -
03_upload_to_argilla.ipynb
in community-efforts/image_preferences
1326 -
01_create_preference_task.ipynb
in cookbook-efforts/kto-preference
1210 -
pipeline.yaml
in cookbook-efforts/domain-specific-datasets/project_app
522 -
01_synthetic_data_generation_total.py
in community-efforts/image_preferences
421 6
01_synthetic_data_generation_images.py
in community-efforts/image_preferences
398 6
04_binarize_preference_results.ipynb
in community-efforts/image_preferences
276 -
01_synthetic_data_generation_prompts.py
in community-efforts/image_preferences
258 1
01_data_prep.ipynb
in cookbook-efforts/dpo-orpo-preference
248 -
Translation_with_distilabel_gpt_4_turbo.ipynb
in community-efforts/prompt_translation
220 -
2_👩🏼‍🔬 Describe Domain.py
in cookbook-efforts/domain-specific-datasets/project_app/pages
197 -
app.py
in community-efforts/prompt_translation/dashboard_template
187 12
3_🌱 Generate Dataset.py
in cookbook-efforts/domain-specific-datasets/project_app/pages
184 -
domain_expert_pipeline.py
in cookbook-efforts/domain-specific-datasets/distilabel_pipelines
180 2
aya_dpo_gen.py
in cookbook-efforts/dpo-orpo-preference
156 4
hub.py
in cookbook-efforts/domain-specific-datasets/project_app
122 7
02_image_prefernces_cleaned_filtered_sfw.py
in community-efforts/image_preferences
107 1
05_fine_tune_flux_lora.ipynb
in community-efforts/image_preferences
99 -
app.py
in cookbook-efforts/domain-specific-datasets/parent_app
74 -
custom_preference_to_argilla.py
in cookbook-efforts/dpo-orpo-preference
70 3
utils.py
in cookbook-efforts/domain-specific-datasets/project_app
67 3
preference_gen.py
in cookbook-efforts/kto-preference
62 3
hub.py
in cookbook-efforts/domain-specific-datasets/parent_app
55 4
template.html
in community-efforts/image_preferences
53 -
dumpy.py
in community-efforts/prompt_translation/dashboard_template
43 -
app.py
in cookbook-efforts/domain-specific-datasets/project_app
43 -
defaults.py
in cookbook-efforts/domain-specific-datasets/project_app
42 -
4_🔍 Review Generated Data.py
in cookbook-efforts/domain-specific-datasets/project_app/pages
31 -
infer.py
in cookbook-efforts/domain-specific-datasets/project_app
19 1
00_imgsys_shuffled_deduplicated_cleaned.py
in community-efforts/image_preferences
18 -
🧑‍🌾 Domain Data Grower.py
in cookbook-efforts/domain-specific-datasets/parent_app/pages
8 1
in
requirements.in
in cookbook-efforts/dpo-orpo-preference
7 -
in
requirements.in
in cookbook-efforts/kto-preference
6 -
in
requirements.in
in community-efforts/prompt_translation
2 -
Files With Most Units (Top 14)
File# lines# units
app.py
in community-efforts/prompt_translation/dashboard_template
187 12
hub.py
in cookbook-efforts/domain-specific-datasets/project_app
122 7
01_synthetic_data_generation_total.py
in community-efforts/image_preferences
421 6
01_synthetic_data_generation_images.py
in community-efforts/image_preferences
398 6
hub.py
in cookbook-efforts/domain-specific-datasets/parent_app
55 4
aya_dpo_gen.py
in cookbook-efforts/dpo-orpo-preference
156 4
preference_gen.py
in cookbook-efforts/kto-preference
62 3
utils.py
in cookbook-efforts/domain-specific-datasets/project_app
67 3
custom_preference_to_argilla.py
in cookbook-efforts/dpo-orpo-preference
70 3
domain_expert_pipeline.py
in cookbook-efforts/domain-specific-datasets/distilabel_pipelines
180 2
01_synthetic_data_generation_prompts.py
in community-efforts/image_preferences
258 1
02_image_prefernces_cleaned_filtered_sfw.py
in community-efforts/image_preferences
107 1
infer.py
in cookbook-efforts/domain-specific-datasets/project_app
19 1
🧑‍🌾 Domain Data Grower.py
in cookbook-efforts/domain-specific-datasets/parent_app/pages
8 1
Files With Long Lines (Top 13)

There are 13 files with lines longer than 120 characters. In total, there are 144 long lines.

File# lines# units# long lines
01_create_preference_task.ipynb
in cookbook-efforts/kto-preference
1210 - 78
02_upload_prompt_translation_data.ipynb
in community-efforts/prompt_translation
1458 - 20
04_binarize_preference_results.ipynb
in community-efforts/image_preferences
276 - 14
03_upload_to_argilla.ipynb
in community-efforts/image_preferences
1326 - 11
01_data_prep.ipynb
in cookbook-efforts/dpo-orpo-preference
248 - 6
05_fine_tune_flux_lora.ipynb
in community-efforts/image_preferences
99 - 4
Translation_with_distilabel_gpt_4_turbo.ipynb
in community-efforts/prompt_translation
220 - 3
app.py
in community-efforts/prompt_translation/dashboard_template
187 12 2
2_👩🏼‍🔬 Describe Domain.py
in cookbook-efforts/domain-specific-datasets/project_app/pages
197 - 2
3_🌱 Generate Dataset.py
in cookbook-efforts/domain-specific-datasets/project_app/pages
184 - 1
app.py
in cookbook-efforts/domain-specific-datasets/parent_app
74 - 1
🧑‍🌾 Domain Data Grower.py
in cookbook-efforts/domain-specific-datasets/parent_app/pages
8 1 1
aya_dpo_gen.py
in cookbook-efforts/dpo-orpo-preference
156 4 1