Introduction According to Wikipedia definition: Apache Spark is an open-source distributed general-purpose cluster-computing framework. Open-source cluster-computing framework Can perform real-time data processing and batch processing Originally developed at the University of California, Berkeley's AMPLab and later donated to Apache Software Foundation High-level APIs in Java, Scala, R, Python Hadoop MapReduce Limitations