awslabs / aws-data-wrangler
Conditional Complexity

The distribution of complexity of units (measured with McCabe index).

Intro
  • Conditional complexity (also called cyclomatic complexity) is a term used to measure the complexity of software. The term refers to the number of possible paths through a program function. A higher value ofter means higher maintenance and testing costs (infosecinstitute.com).
  • Conditional complexity is calculated by counting all conditions in the program that can affect the execution path (e.g. if statement, loops, switches, and/or operators, try and catch blocks...).
  • Conditional complexity is measured at the unit level (methods, functions...).
  • Units are classified in four categories based on the measured McCabe index: 1-5 (simple units), 6-10 (medium complex units), 11-25 (complex units), 26+ (very complex units).
Learn more...
Conditional Complexity Overall
  • There are 557 units with 4,796 lines of code in units (36.7% of code).
    • 0 very complex units (0 lines of code)
    • 1 complex units (280 lines of code)
    • 9 medium complex units (267 lines of code)
    • 20 simple units (373 lines of code)
    • 527 very simple units (3,876 lines of code)
0% | 5% | 5% | 7% | 80%
Legend:
51+
26-50
11-25
6-10
1-5
Alternative Visuals
Conditional Complexity per Extension
51+
26-50
11-25
6-10
1-5
py0% | 5% | 5% | 7% | 80%
Conditional Complexity per Logical Component
primary logical decomposition
51+
26-50
11-25
6-10
1-5
awswrangler0% | 15% | 9% | 6% | 68%
awswrangler/s30% | 0% | 6% | 14% | 78%
awswrangler/opensearch0% | 0% | 10% | 9% | 79%
awswrangler/data_api0% | 0% | 0% | 22% | 77%
awswrangler/catalog0% | 0% | 0% | 1% | 98%
awswrangler/athena0% | 0% | 0% | 0% | 100%
awswrangler/quicksight0% | 0% | 0% | 0% | 100%
awswrangler/lakeformation0% | 0% | 0% | 0% | 100%
awswrangler/dynamodb0% | 0% | 0% | 0% | 100%
Most Complex Units
Top 20 most complex units
Unit# linesMcCabe index# params
def _build_cluster_args()
in awswrangler/emr.py
280 50 1
def athena2pyarrow()
in awswrangler/_data_types.py
39 21 1
def _fetch()
in awswrangler/s3/_fs.py
58 21 3
def _get_endpoint_url()
in awswrangler/_utils.py
23 21 1
def _apply_index()
in awswrangler/s3/_read_parquet.py
27 18 3
def athena2pandas()
in awswrangler/_data_types.py
27 15 1
def pyarrow2timestream()
in awswrangler/_data_types.py
20 13 1
def athena2quicksight()
in awswrangler/_data_types.py
27 13 1
def _df_doc_generator()
in awswrangler/opensearch/_write.py
19 13 1
def apply_configs()
in awswrangler/_config.py
27 11 2
def _cast_pandas_column()
in awswrangler/_data_types.py
23 9 4
def _union()
in awswrangler/s3/_read.py
14 9 2
def _apply_timezone()
in awswrangler/s3/_read_parquet.py
15 9 3
def flush()
in awswrangler/s3/_fs.py
43 9 2
def cast_pandas_with_athena_types()
in awswrangler/_data_types.py
18 8 3
def _get_value_hash()
in awswrangler/s3/_write_dataset.py
23 8 3
def _extract_partitions_from_path()
in awswrangler/s3/_read.py
12 8 2
def readline()
in awswrangler/s3/_fs.py
16 8 2
def _cast_value()
in awswrangler/timestream.py
16 8 2
def _cast2date()
in awswrangler/_data_types.py
8 7 1