awslabs / ml-io
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 243 files with 16,244 lines of code.
    • 1 very long files (1,354 lines of code)
    • 4 long files (2,668 lines of code)
    • 5 medium size files (1,255 lines of codeclsfd_ftr_w_mp_ins)
    • 27 small files (3,721 lines of code)
    • 206 very small files (7,246 lines of code)
8% | 16% | 7% | 22% | 44%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
h21% | 0% | 4% | 14% | 60%
cc0% | 21% | 11% | 29% | 37%
c0% | 100% | 0% | 0% | 0%
py0% | 0% | 0% | 47% | 52%
proto0% | 0% | 0% | 100% | 0%
yaml0% | 0% | 0% | 71% | 28%
cmake0% | 0% | 0% | 0% | 100%
in0% | 0% | 0% | 0% | 100%
yml0% | 0% | 0% | 0% | 100%
cfg0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
src/mlio-py/mlio23% | 26% | 12% | 16% | 20%
src/mlio0% | 36% | 16% | 17% | 28%
include/mlio0% | 0% | 0% | 39% | 60%
src/mlio/data_stores0% | 0% | 0% | 52% | 47%
src/mlio/record_readers0% | 0% | 0% | 23% | 76%
src/mlio/instance_readers0% | 0% | 0% | 27% | 72%
src/mlio/streams0% | 0% | 0% | 22% | 78%
include/mlio/memory0% | 0% | 0% | 32% | 67%
src/mlio/integ0% | 0% | 0% | 100% | 0%
src/mlio/util0% | 0% | 0% | 72% | 27%
src/mlio/detail0% | 0% | 0% | 35% | 64%
src/mlio/memory0% | 0% | 0% | 29% | 70%
packaging/conda/recipe0% | 0% | 0% | 68% | 32%
include/mlio/streams0% | 0% | 0% | 0% | 100%
include/mlio/data_stores0% | 0% | 0% | 0% | 100%
include/mlio/record_readers0% | 0% | 0% | 0% | 100%
include/mlio/util0% | 0% | 0% | 0% | 100%
cmake0% | 0% | 0% | 0% | 100%
include/mlio/detail0% | 0% | 0% | 0% | 100%
include0% | 0% | 0% | 0% | 100%
src/mlio-py0% | 0% | 0% | 0% | 100%
src0% | 0% | 0% | 0% | 100%
ROOT0% | 0% | 0% | 0% | 100%
include/mlio/integ0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
xxh3.h
in src/mlio-py/mlio/contrib/insights/hll
1354 53
xxhash.c
in src/mlio-py/mlio/contrib/insights/hll
905 1
recordio_protobuf_reader.cc
in src/mlio
618 16
data_reader.cc
in src/mlio-py/mlio/core
595 14
csv_reader.cc
in src/mlio
550 14
image_reader.cc
in src/mlio
331 10
xxhash.h
in src/mlio-py/mlio/contrib/insights/hll
261 -
data_store.cc
in src/mlio-py/mlio/core
234 6
py_buffer.cc
in src/mlio-py/mlio/core
226 17
parallel_data_reader.cc
in src/mlio
203 9
csv_record_reader.cc
in src/mlio/record_readers
200 4
s3_client.cc
in src/mlio
189 10
intrusive_ptr.h
in include/mlio
188 10
core_instance_reader.cc
in src/mlio/instance_readers
178 7
data_type.h
in include/mlio
176 1
error.cc
in src/mlio-py/mlio/core
165 6
sagemaker_pipe_input_stream.cc
in src/mlio/streams
165 6
py_device_array.cc
in src/mlio-py/mlio/core
159 2
span.h
in include/mlio
159 20
tensor.cc
in src/mlio
156 12
__init__.py
in src/mlio-py/mlio
135 3
dlpack.cc
in src/mlio/integ
135 5
memory_slice.h
in include/mlio/memory
135 15
tensor.h
in include/mlio
132 12
module.cc
in src/mlio-py/mlio/contrib/insights
129 1
stream.cc
in src/mlio-py/mlio/core
129 11
tensor.cc
in src/mlio-py/mlio/core
126 3
number.cc
in src/mlio/util
115 3
s3_object.cc
in src/mlio/data_stores
111 4
csv_record_tokenizer.cc
in src/mlio
109 4
column_analyzer.cc
in src/mlio-py/mlio/contrib/insights
108 1
instance.cc
in src/mlio
108 3
proto
recordio_protobuf.proto
in src/mlio/detail/protobuf
104 -
file_list.cc
in src/mlio/data_stores
103 5
schema.h
in include/mlio
103 4
file_backed_memory_block.cc
in src/mlio/memory
102 3
meta.yaml
in packaging/conda/recipe
102 -
file_input_stream.cc
in src/mlio/streams
100 6
csv_reader.h
in include/mlio
95 1
cpu_array.h
in include/mlio
92 8
device.h
in include/mlio
90 7
schema.cc
in src/mlio
88 4
hyperloglog.cc
in src/mlio-py/mlio/contrib/insights/hll
85 5
device_array.h
in include/mlio
83 7
shuffled_instance_reader.cc
in src/mlio/instance_readers
79 4
arrow_file.cc
in src/mlio-py/mlio/integ/arrow
77 8
py_memory_block.h
in src/mlio-py/mlio/core
74 9
schema.cc
in src/mlio-py/mlio/core
72 2
s3_input_stream.cc
in src/mlio/streams
72 6
coo_tensor_builder.h
in src/mlio
71 4
Files With Most Units (Top 20)
File# lines# units
xxh3.h
in src/mlio-py/mlio/contrib/insights/hll
1354 53
span.h
in include/mlio
159 20
py_buffer.cc
in src/mlio-py/mlio/core
226 17
recordio_protobuf_reader.cc
in src/mlio
618 16
memory_slice.h
in include/mlio/memory
135 15
data_reader.cc
in src/mlio-py/mlio/core
595 14
csv_reader.cc
in src/mlio
550 14
tensor.cc
in src/mlio
156 12
tensor.h
in include/mlio
132 12
stream.cc
in src/mlio-py/mlio/core
129 11
s3_client.cc
in src/mlio
189 10
image_reader.cc
in src/mlio
331 10
intrusive_ptr.h
in include/mlio
188 10
py_memory_block.h
in src/mlio-py/mlio/core
74 9
parallel_data_reader.cc
in src/mlio
203 9
endian.h
in include/mlio
65 9
arrow_file.cc
in src/mlio-py/mlio/integ/arrow
77 8
tensor_visitor.cc
in src/mlio
40 8
logger.h
in src/mlio
55 8
memory_input_stream.cc
in src/mlio/streams
61 8
Files With Long Lines (Top 5)

There are 5 files with lines longer than 120 characters. In total, there are 19 long lines.

File# lines# units# long lines
recordio_protobuf_reader.cc
in src/mlio
618 16 8
image_reader.cc
in src/mlio
331 10 6
csv_reader.cc
in src/mlio
550 14 3
recordio_record_reader.cc
in src/mlio/record_readers
45 - 1
file_backed_memory_allocator.cc
in src/mlio/memory
61 5 1