apache / nutch
Apache Nutch is an extensible and scalable web crawler
GitHub Repo 
56K
lines of main code
681 files
12K
lines of test code
132 files
9.3K
lines of other code
151 files
20y
age
7,408 days
27%
main code touched
1 year (16K LOC)
0%
new main code
1 year (0 LOC)
48K
java
7.8K
xml
0.5K
html
XSD
0.2K
xsd
XSL
0.07K
xsl
RSS
0.04K
rss

7

75

78

64

100

172

171

344

237

149

225

132

93

138

244

199

102

66

157

482

3

9

7

5

6

9

14

25

28

21

12

7

6

6

7

6

7

6

6

8

2025 2024 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006

generated by sokrates.dev (configuration) on 2025-05-07