data_measurements/dataset_statistics.py (7 lines): - line 47: # TODO: Read this in depending on chosen language / expand beyond english - line 128: # TODO: Not being used anymore; make sure & remove. - line 135: # TODO: Not being used anymore. Make sure and remove - line 305: # TODO: Tighten the rest of this similar to text_duplicates. - line 458: # TODO: Are we not using this anymore? - line 517: # TODO: Does z even need to be self? - line 575: # TODO: Add warnings (which words are missing) to log file? data_measurements/npmi/npmi.py (4 lines): - line 34: # TODO: Should be possible for a user to specify this. - line 82: # TODO: Users ideally can type in whatever words they want. - line 89: # TODO: Let users specify - line 289: # TODO: Change this logic so just the vocabulary is given. npmi/npmi.py (4 lines): - line 15: # TODO: Change print statements to logging? - line 47: # TODO: Is this necessary? - line 75: # TODO: Create docs for this. - line 186: # TODO: Is this better? app.py (3 lines): - line 133: # TODO: If these are cached, can't we just show them by default? - line 165: # TODO: Fix how this is a weird outlier. - line 178: # # TODO: Make this less of a weird outlier. data_measurements/labels/labels.py (3 lines): - line 20: # TODO: This should ideally be in what's returned from the evaluate library - line 164: # TODO: Handle the case where there are multiple label columns. - line 226: # TODO: Incorporate this summation into what the evaluate library returns. utils/gradio_utils.py (3 lines): - line 124: # TODO @yacine: Explain what this is doing and why eg tp[0] could = "id" - line 280: # TODO: Check if this is slow when the size is large -- - line 405: # TODO: Nice UI version of the content in the comments. data_measurements/text_duplicates/text_duplicates.py (2 lines): - line 18: # This isn't in the evaluate measurement, but TODO to add that... - line 86: # TODO: Use df_to_html rather than write_json_as_html; data_measurements/zipf/zipf.py (2 lines): - line 127: # TODO: These proportions may have already been calculated. - line 179: # TODO: This might fit better in its own file handling class? run_data_measurements.py (2 lines): - line 58: # TODO: Catch error exceptions for each measurement, so that an error - line 304: # TODO: print out local or hub cache directory location. widgets/text_lengths.py (1 line): - line 55: # TODO: Add text on choosing the length you want to the dropdown. data_measurements/lengths/lengths.py (1 line): - line 100: # This is a hack to handle a UI display error (TODO: file bug) data_measurements/embeddings/embeddings.py (1 line): - line 254: # TODO: batch across second dimension data_measurements/perplexity/perplexity.py (1 line): - line 40: # TODO: What other stuff might be useful to grab? utils/dataset_utils.py (1 line): - line 353: # TODO: Breaks the CLI if this isn't checked.