apache / incubator-stormcrawler
File Age & Freshness

File age measurements show the distribution of file ages (days since the first commit) and the file freshness (days since the latest commit).

Summary
File Change History Overall
File Age Distribution Overall
Days since first update
  • There are 110 files with 10,808 lines of code in files.
    • 110 files that are 366+ days old (10,808 lines of code)
    • 0 files that are 181-365 days old (0 lines of code)
    • 0 files that are 91-180 days old (0 lines of code)
    • 0 files that are 31-90 days old (0 lines of code)
    • 0 files that are 1-30 days old (0 lines of code)
100% | 0% | 0% | 0% | 0%
Legend:
366+
181-365
91-180
31-90
1-30

explore: grouped by folders | grouped by age
File Freshness Distribution Overall
Days since last update
  • There are 110 files with 10,808 lines of code in files.
    • 83 files have been last changed 366+ days ago (5,954 lines of code)
    • 16 files have been last changed 181-365 days ago (2,249 lines of code)
    • 10 files have been last changed 91-180 days ago (2,556 lines of code)
    • 1 files have been last changed 31-90 days ago (49 lines of code)
    • 0 files have been last changed 1-30 days ago (0 lines of code)
55% | 20% | 23% | <1% | 0%
Legend:
366+
181-365
91-180
31-90
1-30

explore: grouped by folders | grouped by freshness
File Change History per File Extension
java, xml, yaml, md, json, txt, html, sh, flux, gitignore, groovy, gitattributes, rss, properties
File Age Distribution per Extension
Days since first update
366+
181-365
91-180
31-90
1-30
java100% | 0% | 0% | 0% | 0%
yaml100% | 0% | 0% | 0% | 0%
flux100% | 0% | 0% | 0% | 0%
xml100% | 0% | 0% | 0% | 0%
File Freshness Distribution per Extension
Days since last update
366+
181-365
91-180
31-90
1-30
java55% | 20% | 23% | <1% | 0%
flux100% | 0% | 0% | 0% | 0%
xml7% | 0% | 92% | 0% | 0%
yaml0% | 100% | 0% | 0% | 0%
File Change History per Logical Decomposition
primary
primary (file age distribution)
Days since first update
366+
181-365
91-180
31-90
1-30
core100% | 0% | 0% | 0% | 0%
archetype100% | 0% | 0% | 0% | 0%
primary (file freshness distribution)
Days since last update
366+
181-365
91-180
31-90
1-30
core55% | 20% | 23% | <1% | 0%
archetype54% | 28% | 17% | 0% | 0%
Oldest Files (Top 50)
File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
crawler-default.yaml
in core/src/main/resources
85 - 2015-10-15 2024-05-21 83 11 julien@digitalpebble.com rzo1@apache.org
crawler-conf.yaml
in archetype/src/main/resources/archetype-resources
62 - 2015-12-11 2024-05-21 52 7 julien@digitalpebble.com rzo1@apache.org
archetype-metadata.xml
in archetype/src/main/resources/META-INF/maven
37 - 2015-12-11 2024-11-13 11 3 julien@digitalpebble.com mvolikas@gmail.com
default-regex-normalizers.xml
in archetype/src/main/resources/archetype-resources/src/main/resources
3 - 2015-12-11 2015-12-17 2 1 julien@digitalpebble.com julien@digitalpebble.com
flux
crawler.flux
in archetype/src/main/resources/archetype-resources
115 - 2016-06-16 2024-03-28 13 2 julien@digitalpebble.com julien@digitalpebble.com
FetcherBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
714 23 2024-03-28 2024-11-22 3 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
HttpProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/okhttp
475 10 2024-03-28 2024-05-21 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
SimpleFetcherBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
397 7 2024-03-28 2024-11-22 3 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
JSoupParserBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
368 7 2024-03-28 2024-11-22 6 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
BasicURLNormalizer.java
in core/src/main/java/org/apache/stormcrawler/filtering/basic
297 8 2024-03-28 2024-11-22 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
SiteMapParserBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
287 7 2024-03-28 2024-09-10 4 4 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
HttpProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/httpclient
283 8 2024-03-28 2024-05-21 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
FastURLFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
245 12 2024-03-28 2024-12-02 4 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AbstractIndexerBolt.java
in core/src/main/java/org/apache/stormcrawler/indexing
208 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DelegatorProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol
192 12 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RegexURLNormalizer.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
186 6 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
FeedParserBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
184 5 2024-03-28 2024-09-10 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
XPathFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
175 7 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Metadata.java
in core/src/main/java/org/apache/stormcrawler
172 23 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
FileSpout.java
in core/src/main/java/org/apache/stormcrawler/spout
169 14 2024-03-28 2024-05-21 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AbstractStatusUpdaterBolt.java
in core/src/main/java/org/apache/stormcrawler/persistence
166 5 2024-03-28 2024-11-22 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AbstractQueryingSpout.java
in core/src/main/java/org/apache/stormcrawler/persistence
166 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
CharsetIdentification.java
in core/src/main/java/org/apache/stormcrawler/util
157 8 2024-03-28 2024-11-22 3 2 julien@digitalpebble.com 13417392+rzo1@users.noreply...
TextExtractor.java
in core/src/main/java/org/apache/stormcrawler/parse
155 7 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
HttpRobotRulesParser.java
in core/src/main/java/org/apache/stormcrawler/protocol
151 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLFilters.java
in core/src/main/java/org/apache/stormcrawler/filtering
144 7 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DefaultScheduler.java
in core/src/main/java/org/apache/stormcrawler/persistence
138 6 2024-03-28 2024-12-02 4 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
ConfUtils.java
in core/src/main/java/org/apache/stormcrawler/util
138 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
MultiProxyManager.java
in core/src/main/java/org/apache/stormcrawler/proxy
137 9 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
ParseFilters.java
in core/src/main/java/org/apache/stormcrawler/parse
134 10 2024-03-28 2024-09-10 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AbstractHttpProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol
127 9 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
CollectionTagger.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
125 10 2024-03-28 2024-10-10 2 3 13417392+rzo1@users.noreply... psxjoy@outlook.com
InitialisationUtil.java
in core/src/main/java/org/apache/stormcrawler/util
125 10 2024-03-28 2024-05-21 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AdaptiveScheduler.java
in core/src/main/java/org/apache/stormcrawler/persistence
124 2 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
MemorySpout.java
in core/src/main/java/org/apache/stormcrawler/spout
122 10 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLPartitionerBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
115 3 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
JSoupFilters.java
in core/src/main/java/org/apache/stormcrawler/parse
112 8 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
SchedulingURLBuffer.java
in core/src/main/java/org/apache/stormcrawler/persistence/urlbuffer
112 6 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RobotRulesParser.java
in core/src/main/java/org/apache/stormcrawler/protocol
111 6 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
FileResponse.java
in core/src/main/java/org/apache/stormcrawler/protocol/file
107 5 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
RegexURLFilterBase.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
107 4 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Protocol.java
in core/src/main/java/org/apache/stormcrawler/protocol
105 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
SCProxy.java
in core/src/main/java/org/apache/stormcrawler/proxy
104 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RobotsTags.java
in core/src/main/java/org/apache/stormcrawler/util
101 10 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLUtil.java
in core/src/main/java/org/apache/stormcrawler/util
100 8 2024-03-28 2024-10-10 3 3 13417392+rzo1@users.noreply... psxjoy@outlook.com
RemoteDriverProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/selenium
90 3 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
CookieConverter.java
in core/src/main/java/org/apache/stormcrawler/util
88 2 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
LDJsonParseFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
87 6 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
XPathFilter.java
in core/src/main/java/org/apache/stormcrawler/jsoup
87 6 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
MetadataTransfer.java
in core/src/main/java/org/apache/stormcrawler/util
84 5 2024-03-28 2024-05-21 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
Files Not Recently Changed (Top 50)
File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
default-regex-normalizers.xml
in archetype/src/main/resources/archetype-resources/src/main/resources
3 - 2015-12-11 2015-12-17 2 1 julien@digitalpebble.com julien@digitalpebble.com
EmptyQueueListener.java
in core/src/main/java/org/apache/stormcrawler/persistence
5 - 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
ProxyManager.java
in core/src/main/java/org/apache/stormcrawler/proxy
7 - 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
JSoupFilter.java
in core/src/main/java/org/apache/stormcrawler/parse
10 - 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RegexRule.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
11 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
NavigationFilter.java
in core/src/main/java/org/apache/stormcrawler/protocol/selenium
11 - 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering
13 - 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Status.java
in core/src/main/java/org/apache/stormcrawler/persistence
14 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
JSONResource.java
in core/src/main/java/org/apache/stormcrawler
16 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
AbstractConfigurable.java
in core/src/main/java/org/apache/stormcrawler/util
17 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
MemoryStatusUpdater.java
in core/src/main/java/org/apache/stormcrawler/persistence
17 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
StdOutStatusUpdater.java
in core/src/main/java/org/apache/stormcrawler/persistence
19 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Constants.java
in core/src/main/java/org/apache/stormcrawler
19 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
CollectionMetric.java
in core/src/main/java/org/apache/stormcrawler/util
20 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RegexURLFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
22 3 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
SelfURLFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/basic
23 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DNSResolutionListener.java
in core/src/main/java/org/apache/stormcrawler/protocol/okhttp
26 3 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DummyIndexer.java
in core/src/main/java/org/apache/stormcrawler/indexing
27 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RefreshTag.java
in core/src/main/java/org/apache/stormcrawler/util
31 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
FileProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/file
32 6 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
HttpHeaders.java
in core/src/main/java/org/apache/stormcrawler/protocol
33 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Outlink.java
in core/src/main/java/org/apache/stormcrawler/parse
35 9 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
StringTabScheme.java
in core/src/main/java/org/apache/stormcrawler/util
36 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DomainParseFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
36 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DebugParseFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
39 3 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Configurable.java
in core/src/main/java/org/apache/stormcrawler/util
41 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
CommaSeparatedToMultivaluedMetadata.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
42 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
SimpleURLBuffer.java
in core/src/main/java/org/apache/stormcrawler/persistence/urlbuffer
43 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
ParseData.java
in core/src/main/java/org/apache/stormcrawler/parse
45 10 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
MaxDepthFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/depth
49 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
MetadataFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/metadata
49 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RobotsFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/robots
51 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLStreamGrouping.java
in core/src/main/java/org/apache/stormcrawler/util
54 4 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
StdOutIndexer.java
in core/src/main/java/org/apache/stormcrawler/indexing
54 3 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
MD5SignatureParseFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
57 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
ConfigurableHelper.java
in core/src/main/java/org/apache/stormcrawler/util
58 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RobotRules.java
in core/src/main/java/org/apache/stormcrawler/protocol
60 13 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
NavigationFilters.java
in core/src/main/java/org/apache/stormcrawler/protocol/selenium
64 4 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
AbstractURLBuffer.java
in core/src/main/java/org/apache/stormcrawler/persistence/urlbuffer
64 7 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
ConfigurableTopology.java
in core/src/main/java/org/apache/stormcrawler
66 4 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DocumentFragmentBuilder.java
in core/src/main/java/org/apache/stormcrawler/parse
69 6 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
ProtocolFactory.java
in core/src/main/java/org/apache/stormcrawler/protocol
71 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLPartitioner.java
in core/src/main/java/org/apache/stormcrawler/util
71 3 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
StatusEmitterBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
73 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
PriorityURLBuffer.java
in core/src/main/java/org/apache/stormcrawler/persistence/urlbuffer
73 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
LinkParseFilter.java
in core/src/main/java/org/apache/stormcrawler/jsoup
74 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
LinkParseFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
76 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLFilterBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
78 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
LDJsonParseFilter.java
in core/src/main/java/org/apache/stormcrawler/jsoup
79 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
ParseResult.java
in core/src/main/java/org/apache/stormcrawler/parse
79 13 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Most Recently Created Files (Top 50)
File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
FetcherBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
714 23 2024-03-28 2024-11-22 3 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
HttpProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/okhttp
475 10 2024-03-28 2024-05-21 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
SimpleFetcherBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
397 7 2024-03-28 2024-11-22 3 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
JSoupParserBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
368 7 2024-03-28 2024-11-22 6 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
BasicURLNormalizer.java
in core/src/main/java/org/apache/stormcrawler/filtering/basic
297 8 2024-03-28 2024-11-22 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
SiteMapParserBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
287 7 2024-03-28 2024-09-10 4 4 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
HttpProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/httpclient
283 8 2024-03-28 2024-05-21 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
FastURLFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
245 12 2024-03-28 2024-12-02 4 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AbstractIndexerBolt.java
in core/src/main/java/org/apache/stormcrawler/indexing
208 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DelegatorProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol
192 12 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RegexURLNormalizer.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
186 6 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
FeedParserBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
184 5 2024-03-28 2024-09-10 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
XPathFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
175 7 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Metadata.java
in core/src/main/java/org/apache/stormcrawler
172 23 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
FileSpout.java
in core/src/main/java/org/apache/stormcrawler/spout
169 14 2024-03-28 2024-05-21 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AbstractStatusUpdaterBolt.java
in core/src/main/java/org/apache/stormcrawler/persistence
166 5 2024-03-28 2024-11-22 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AbstractQueryingSpout.java
in core/src/main/java/org/apache/stormcrawler/persistence
166 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
CharsetIdentification.java
in core/src/main/java/org/apache/stormcrawler/util
157 8 2024-03-28 2024-11-22 3 2 julien@digitalpebble.com 13417392+rzo1@users.noreply...
TextExtractor.java
in core/src/main/java/org/apache/stormcrawler/parse
155 7 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
HttpRobotRulesParser.java
in core/src/main/java/org/apache/stormcrawler/protocol
151 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLFilters.java
in core/src/main/java/org/apache/stormcrawler/filtering
144 7 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DefaultScheduler.java
in core/src/main/java/org/apache/stormcrawler/persistence
138 6 2024-03-28 2024-12-02 4 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
ConfUtils.java
in core/src/main/java/org/apache/stormcrawler/util
138 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
MultiProxyManager.java
in core/src/main/java/org/apache/stormcrawler/proxy
137 9 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
ParseFilters.java
in core/src/main/java/org/apache/stormcrawler/parse
134 10 2024-03-28 2024-09-10 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AbstractHttpProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol
127 9 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
CollectionTagger.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
125 10 2024-03-28 2024-10-10 2 3 13417392+rzo1@users.noreply... psxjoy@outlook.com
InitialisationUtil.java
in core/src/main/java/org/apache/stormcrawler/util
125 10 2024-03-28 2024-05-21 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AdaptiveScheduler.java
in core/src/main/java/org/apache/stormcrawler/persistence
124 2 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
MemorySpout.java
in core/src/main/java/org/apache/stormcrawler/spout
122 10 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLPartitionerBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
115 3 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
JSoupFilters.java
in core/src/main/java/org/apache/stormcrawler/parse
112 8 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
SchedulingURLBuffer.java
in core/src/main/java/org/apache/stormcrawler/persistence/urlbuffer
112 6 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RobotRulesParser.java
in core/src/main/java/org/apache/stormcrawler/protocol
111 6 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
FileResponse.java
in core/src/main/java/org/apache/stormcrawler/protocol/file
107 5 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
RegexURLFilterBase.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
107 4 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Protocol.java
in core/src/main/java/org/apache/stormcrawler/protocol
105 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
SCProxy.java
in core/src/main/java/org/apache/stormcrawler/proxy
104 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
RobotsTags.java
in core/src/main/java/org/apache/stormcrawler/util
101 10 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLUtil.java
in core/src/main/java/org/apache/stormcrawler/util
100 8 2024-03-28 2024-10-10 3 3 13417392+rzo1@users.noreply... psxjoy@outlook.com
RemoteDriverProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/selenium
90 3 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
CookieConverter.java
in core/src/main/java/org/apache/stormcrawler/util
88 2 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
LDJsonParseFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
87 6 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
XPathFilter.java
in core/src/main/java/org/apache/stormcrawler/jsoup
87 6 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
MetadataTransfer.java
in core/src/main/java/org/apache/stormcrawler/util
84 5 2024-03-28 2024-05-21 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
HostURLFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/host
80 1 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
ParseResult.java
in core/src/main/java/org/apache/stormcrawler/parse
79 13 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
LDJsonParseFilter.java
in core/src/main/java/org/apache/stormcrawler/jsoup
79 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLFilterBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
78 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
LinkParseFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
76 2 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Most Recently Changed Files (Top 50)
File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
SitemapFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/sitemap
49 1 2024-03-28 2025-02-28 3 3 13417392+rzo1@users.noreply... tallison@apache.org
FastURLFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
245 12 2024-03-28 2024-12-02 4 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
DefaultScheduler.java
in core/src/main/java/org/apache/stormcrawler/persistence
138 6 2024-03-28 2024-12-02 4 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
FetcherBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
714 23 2024-03-28 2024-11-22 3 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
SimpleFetcherBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
397 7 2024-03-28 2024-11-22 3 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
JSoupParserBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
368 7 2024-03-28 2024-11-22 6 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
BasicURLNormalizer.java
in core/src/main/java/org/apache/stormcrawler/filtering/basic
297 8 2024-03-28 2024-11-22 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
AbstractStatusUpdaterBolt.java
in core/src/main/java/org/apache/stormcrawler/persistence
166 5 2024-03-28 2024-11-22 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
CharsetIdentification.java
in core/src/main/java/org/apache/stormcrawler/util
157 8 2024-03-28 2024-11-22 3 2 julien@digitalpebble.com 13417392+rzo1@users.noreply...
ProtocolResponse.java
in core/src/main/java/org/apache/stormcrawler/protocol
37 3 2024-03-28 2024-11-22 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
archetype-metadata.xml
in archetype/src/main/resources/META-INF/maven
37 - 2015-12-11 2024-11-13 11 3 julien@digitalpebble.com mvolikas@gmail.com
CollectionTagger.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
125 10 2024-03-28 2024-10-10 2 3 13417392+rzo1@users.noreply... psxjoy@outlook.com
URLUtil.java
in core/src/main/java/org/apache/stormcrawler/util
100 8 2024-03-28 2024-10-10 3 3 13417392+rzo1@users.noreply... psxjoy@outlook.com
ParseFilter.java
in core/src/main/java/org/apache/stormcrawler/parse
13 2 2024-03-28 2024-10-10 3 3 13417392+rzo1@users.noreply... psxjoy@outlook.com
SiteMapParserBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
287 7 2024-03-28 2024-09-10 4 4 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
FeedParserBolt.java
in core/src/main/java/org/apache/stormcrawler/bolt
184 5 2024-03-28 2024-09-10 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
ParseFilters.java
in core/src/main/java/org/apache/stormcrawler/parse
134 10 2024-03-28 2024-09-10 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
HttpProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/okhttp
475 10 2024-03-28 2024-05-21 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
HttpProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/httpclient
283 8 2024-03-28 2024-05-21 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
FileSpout.java
in core/src/main/java/org/apache/stormcrawler/spout
169 14 2024-03-28 2024-05-21 3 3 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
InitialisationUtil.java
in core/src/main/java/org/apache/stormcrawler/util
125 10 2024-03-28 2024-05-21 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
crawler-default.yaml
in core/src/main/resources
85 - 2015-10-15 2024-05-21 83 11 julien@digitalpebble.com rzo1@apache.org
MetadataTransfer.java
in core/src/main/java/org/apache/stormcrawler/util
84 5 2024-03-28 2024-05-21 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
crawler-conf.yaml
in archetype/src/main/resources/archetype-resources
62 - 2015-12-11 2024-05-21 52 7 julien@digitalpebble.com rzo1@apache.org
BasicURLFilter.java
in core/src/main/java/org/apache/stormcrawler/filtering/basic
58 3 2024-03-28 2024-05-21 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
URLBuffer.java
in core/src/main/java/org/apache/stormcrawler/persistence/urlbuffer
42 4 2024-03-28 2024-05-21 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
Scheduler.java
in core/src/main/java/org/apache/stormcrawler/persistence
23 1 2024-03-28 2024-05-21 2 2 13417392+rzo1@users.noreply... 13417392+rzo1@users.noreply...
RegexURLNormalizer.java
in core/src/main/java/org/apache/stormcrawler/filtering/regex
186 6 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
TextExtractor.java
in core/src/main/java/org/apache/stormcrawler/parse
155 7 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
MultiProxyManager.java
in core/src/main/java/org/apache/stormcrawler/proxy
137 9 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
AdaptiveScheduler.java
in core/src/main/java/org/apache/stormcrawler/persistence
124 2 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
JSoupFilters.java
in core/src/main/java/org/apache/stormcrawler/parse
112 8 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
RobotRulesParser.java
in core/src/main/java/org/apache/stormcrawler/protocol
111 6 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
FileResponse.java
in core/src/main/java/org/apache/stormcrawler/protocol/file
107 5 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
CookieConverter.java
in core/src/main/java/org/apache/stormcrawler/util
88 2 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
SeleniumProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol/selenium
68 4 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
MimeTypeNormalization.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
36 1 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
PerSecondReducer.java
in core/src/main/java/org/apache/stormcrawler/util
35 3 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
SingleProxyManager.java
in core/src/main/java/org/apache/stormcrawler/proxy
35 3 2024-03-28 2024-05-03 2 3 13417392+rzo1@users.noreply... tallison314159@gmail.com
AbstractIndexerBolt.java
in core/src/main/java/org/apache/stormcrawler/indexing
208 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
DelegatorProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol
192 12 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
XPathFilter.java
in core/src/main/java/org/apache/stormcrawler/parse/filter
175 7 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
Metadata.java
in core/src/main/java/org/apache/stormcrawler
172 23 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
AbstractQueryingSpout.java
in core/src/main/java/org/apache/stormcrawler/persistence
166 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
HttpRobotRulesParser.java
in core/src/main/java/org/apache/stormcrawler/protocol
151 5 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
URLFilters.java
in core/src/main/java/org/apache/stormcrawler/filtering
144 7 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
ConfUtils.java
in core/src/main/java/org/apache/stormcrawler/util
138 15 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
AbstractHttpProtocol.java
in core/src/main/java/org/apache/stormcrawler/protocol
127 9 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
MemorySpout.java
in core/src/main/java/org/apache/stormcrawler/spout
122 10 2024-03-28 2024-03-28 1 2 13417392+rzo1@users.noreply... julien@digitalpebble.com
flux
crawler.flux
in archetype/src/main/resources/archetype-resources
115 - 2016-06-16 2024-03-28 13 2 julien@digitalpebble.com julien@digitalpebble.com