apache / gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
GitHub Repo 
193K
lines of main code
2.7K files
97K
lines of test code
1K files
21K
lines of other code
404 files
11y
age
4,110 days
20%
main code touched
1 year (39K LOC)
2%
new main code
1 year (3.9K LOC)
189K
java
1K
avsc
0.9K
xml
0.9K
js
0.3K
yaml
0.3K
css
0.2K
html
XSL
0.2K
xsl
0.1K
sql
0.06K
py
0.05K
groovy

34

197

215

147

249

286

279

303

1008

1401

1584

747

7

20

21

21

30

30

35

38

54

62

49

13

2025 2024 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014

generated by sokrates.dev (configuration) on 2025-05-07