Besides the projects, there are a few other distinct areas of Apache:
Incubator: for aspiring ASF projects
Attic: for retired ASF projects
INFRA - Apache Infrastructure Team: provides and manages all infrastructure and services for the Apache Software Foundation, and for each project at the Foundation
AGE: PostgreSQL extension that provides graph database functionality in order to enable users of PostgreSQL to use graph query modeling in unison with PostgreSQL's existing relational model
Airavata: a distributed system software framework to manage simple to composite applications with complex execution and workflow patterns on diverse computational resources
Airflow: Python-based platform to programmatically author, schedule and monitor workflows
Allura: Python-based open source implementation of a software forge
Ambari: makes Hadoop cluster provisioning, managing, and monitoring dead simple
AntUnit: The Ant Library provides Ant tasks for testing Ant task, it can also be used to drive functional and integration tests of arbitrary applications with Ant
Ivy: a very powerful dependency manager oriented toward Java dependency management, even though it could be used to manage dependencies of any kind
IvyDE: integrate Ivy in Eclipse with the IvyDE plugin
Jelly: Jelly is a Java and XML based scripting engine. Jelly combines the best ideas from JSTL, Velocity, DVSL, Ant and Cocoon all together in a simple yet powerful scripting engine
Logging: Commons Logging is a thin adapter allowing configurable bridging to other, well known logging systems
Community Development: project that creates and provides tools, processes, and advice to help open-source software projects improve their own community health
SCIMple is an implementation of SCIM v2.0 specification
DolphinScheduler: a distributed ETL scheduling engine with powerful DAG visualization interface
Doris: MPP-based interactive SQL data warehousing for reporting and analysis, good for both high-throughput scenarios and high-concurrency point queries
Drill: software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets
Druid: high-performance, column-oriented, distributed data store
Fluo: a distributed processing system that lets users make incremental updates to large data sets
Fluo Recipes: Apache Fluo Recipes build on the Fluo API to offer additional functionality to developers
Fluo YARN: a tool for running Apache Fluo applications in Apache Hadoop YARN
FreeMarker: a template engine, i.e. a generic tool to generate text output based on templates. FreeMarker is implemented in Java as a class library for programmers
Geode: low latency, high concurrency data management solutions
Kyuubi: a distributed multi-tenant Thrift JDBC/ODBC server for large-scale data management, processing, and analytics, built on top of Apache Spark and designed to support more engines
Libcloud: a standard Python library that abstracts away differences among multiple cloud provider APIs.
Linkis: a computation middleware project, which decouples the upper applications and the underlying data engines, provides standardized interfaces (REST, JDBC, WebSocket etc.) to easily connect to various underlying engines (Spark, Presto, Flink, etc.)
Ozone: scalable, redundant, and distributed object store for Hadoop
Parquet: a general-purpose columnar storage format
PDFBox: Java based PDF library (reading, text extraction, manipulation, viewer)
Mod_perl: module that integrates the Perl interpreter into Apache server
Pekko: toolkit and an ecosystem for building highly concurrent, distributed, reactive and resilient applications for Java and Scala[9]
Petri: deals with the assessment of, education in, and adoption of the Foundation's policies and procedures for collaborative development and the pros and cons of joining the Foundation
Ranger: a framework to enable, monitor and manage comprehensive data security across the Hadoop platform
Ratis: Java implementation for RAFT consensus protocol
RocketMQ: a fast, low latency, reliable, scalable, distributed, easy to use message-oriented middleware, especially for processing large amounts of streaming data
Roller: a full-featured, multi-user and group blog server suitable for both small and large blog sites
Royale: improving developer productivity in creating applications for wherever JavaScript runs (and other runtimes)
Rya: cloud-based RDF triple store that supports SPARQL queries
Rivet: Server-side Tcl programming system combining ease of use and power
Websh: Websh is a rapid development environment for building powerful, fast, and reliable web applications in Tcl
Tez: an effort to develop a generic application framework which can be used to process arbitrarily complex directed-acyclic graphs (DAGs) of data-processing tasks and also a re-usable set of data-processing primitives which can be used by other projects
Thrift : Interface definition language and binary communication protocol that is used to define and create services for numerous languages
Tika: content analysis toolkit for extracting metadata and text from digital documents of various types, e.g., audio, video, image, office suite, web, mail, and binary
TinkerPop: A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP)
TomEE: an all-Apache Java EE 6 Web Profile stack for Apache Tomcat
Traffic Control: Built around Apache Traffic Server as the caching software, Traffic Control implements all the core functions of a modern CDN. Traffic Control
Batik: pure Java library for SVG content manipulation
FOP: Java print formatter driven by XSL formatting objects (XSL-FO); supported output formats include PDF, PS, PCL, AFP, XML (area tree representation), Print, AWT and PNG, and to a lesser extent, RTF and TXT
Yetus: a collection of libraries and tools that enable contribution and release processes for software projects
YuniKorn: standalone resource scheduler responsible for scheduling batch jobs and long-running services on large scale distributed systems
Zeppelin: a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems
ZooKeeper: coordination service for distributed applications
Incubating projects
Annotator: provides annotation enabling code for browsers, servers, and humans
Baremaps: toolkit and a set of infrastructure components for creating, publishing and operating online maps
Celeborn: intermediate data service for big data computing engines to boost performance, stability and flexibility
DataLab: platform for creating self-service, exploratory data science environments in the cloud using best-of-breed data science tools
DevLake: development data platform, providing the data infrastructure for developer teams to analyze and improve their engineering productivity
HugeGraph: a large-scale and easy-to-use graph database
KIE: community of solutions and supporting tooling for knowledge engineering and process automation, focusing on events, rules and workflows
Liminal: an end-to-end platform for data engineers and scientists, allowing them to build, train and deploy machine learning models in a robust and agile way
Livy: web service that exposes a REST interface for managing long-running Spark contexts
Milagro: core security infrastructure for decentralized networks
OpenDAL: Open Data Access Layer. Offers native layer support, enabling users to implement middleware or intercept for all operations
Paimon: unified lake storage to build dynamic tables for both stream and batch processing with big data compute engines, supporting high-speed data ingestion and real-time data query
Pegasus: distributed key-value storage system which is designed to be simple, horizontally scalable, strongly consistent and high-performance
Pony Mail: mail-archiving, archive viewing, and interaction service
StreamPark: a streaming application development platform
Toree: provides applications with a mechanism to interactively and remotely access Spark
Training: project aims to develop resources which can be used for training purposes in various media formats, languages and for various Apache and non-Apache target projects
Tuweni: set of libraries and other tools to aid development of blockchain and other decentralized software in Java and other JVM languages
A retired project is one which has been closed down on the initiative of the board, the project its PMC, the PPMC or the IPMC for various reasons. It is no longer developed at the Apache Software Foundation and does not have any other duties.
ACE: a distribution framework that allows central management and distribution of software components, configuration data and other artefacts to target systems
Any23: Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents
Apex: Enterprise-grade unified stream and batch processing engine
Aurora: Mesos framework for long-running services and cron jobs
AxKit: XML Application Server for Apache. It provided on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code
Buildr: a build system for Java-based applications, including support for Scala, Groovy and a growing number of JVM languages and tools
Chemistry: provides open source implementations of the Content Management Interoperability Services (CMIS) specification
Chukwa: Chukwa is an open source data collection system for monitoring large distributed systems
Clerezza: a service platform which provides a set of functionality for management of semantically linked data accessible through RESTful Web Services and in a secured way
Crimson: Java XML parser which supports XML 1.0 via various APIs
Crunch: Provides a framework for writing, testing, and running MapReduce pipelines
Deltacloud: provides common front-end APIs to abstract differences between cloud providers
DeviceMap: device Data Repository and classification API
DirectMemory: off-heap cache for the Java Virtual Machine
DRAT: large scale code license analysis, auditing and reporting
Eagle: open source analytics solution for identifying security and performance issues instantly on big data platforms
ECS: API for generating elements for various markup languages
ESME: secure and highly scalable microsharing and micromessaging platform that allows people to discover and meet one another and get controlled access to other sources of information, all in a business process context
Etch: cross-platform, language- and transport-independent RPC-like messaging framework
Excalibur: Java inversion of control framework including containers and components
ODE: Apache ODE is a WS-BPEL implementation that supports web services orchestration using flexible process definitions.
ObJectRelationalBridge (OJB): Object/Relational mapping tool that allowed transparent persistence for Java Objects against relational databases
Oltu - Parent: OAuth protocol implementation in Java
Onami: project focused on the development and maintenance of a set of Google Guice extensions not provided out of the box by the library itself
OODT: Object Oriented Data Technology, a data management framework for capturing and sharing data
Open Climate Workbench: A comprehensive suite of algorithms, libraries, and interfaces designed to standardize and streamline the process of interacting with large quantities of observational data and conducting regional climate model evaluations
ORO: Regular Expression engine supporting various dialects
Polygene: community based effort exploring Composite Oriented Programming for domain centric application development
PredictionIO: PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks.
REEF: A scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos
River: provides a standards-compliant JINI service
Sentry: Fine grained authorization to data and metadata in Apache Hadoop
Shale: web application framework based on JavaServer Faces
Shindig: OpenSocial container; helps start hosting OpenSocial apps quickly by providing the code to render gadgets, proxy requests, and handle REST and RPC requests
Sqoop: a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases
STDCXX: collection of algorithms, containers, iterators, and other fundamental components of every piece of software, implemented as C++ classes, templates, and functions essential for writing C++ programs
Stanbol: Software components for semantic content management
Tajo: relational data warehousing system. It using the hadoop file system as distributed storage.
Tiles: templating framework built to simplify the development of web application user interfaces.
Trafodion: Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop[11][12][13]
Tuscany: SCA implementation, also providing other SOA implementations
Twill: Use Apache Hadoop YARN's distributed capabilities with a programming model that is similar to running threads
Usergrid: an open-source Backend-as-a-Service ("BaaS" or "mBaaS") composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications
VXQuery: Apache VXQuery implements a parallel XML Query processor.