1 RStudio, "sparklyr–R interface for Apache Spark"
2 dplyr, "dplyr: A grammar of data manipulation"
3 TopicModeling, "Topic modeling on Apache Spark"
4 Scala, "The Scala programming language"
5 Shvachko, K., "The Hadoop distributed file system" IEEE 1-10, 2010
6 Spark-tfocs, "TFOCS for Spark: A community port of TFOCS for Apache Spark"
7 H2O.ai, "Sparkling Water"
8 Sparkit-learn, "Sparkit-learn"
9 SparkR, "SparkR (R on spark)"
10 Moritz, P., "SparkNet: Training deep networks in Spark"
1 RStudio, "sparklyr–R interface for Apache Spark"
2 dplyr, "dplyr: A grammar of data manipulation"
3 TopicModeling, "Topic modeling on Apache Spark"
4 Scala, "The Scala programming language"
5 Shvachko, K., "The Hadoop distributed file system" IEEE 1-10, 2010
6 Spark-tfocs, "TFOCS for Spark: A community port of TFOCS for Apache Spark"
7 H2O.ai, "Sparkling Water"
8 Sparkit-learn, "Sparkit-learn"
9 SparkR, "SparkR (R on spark)"
10 Moritz, P., "SparkNet: Training deep networks in Spark"
11 Zaharia, M., "Spark: cluster computing with working sets" USENIX Association 2010
12 Armbrust, M., "Spark SQL: Relational data processing in Spark" ACM 1383-1394, 2015
13 Spark-cassandra-connector, "Spark Cassandra Connector"
14 Spark-sklearn, "Scikit-learn integration package for Apache Spark"
15 Pedregosa, F., "Scikit-learn : Machine learning in Python" 12 : 2825-2830, 2011
16 Hunter, T., "Scaling the mobile millennium system in the cloud" ACM 2011
17 Bahmani, B., "Scalable k-means++" 5 : 622-633, 2012
18 Zaharia, M., "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing" USENIX Association 2012
19 Spark Wiki, "Powered By Spark"
20 Hindman, B., "Mesos: A platform for fine-grained resource sharing in the data center" USENIX Association 2011
21 Zadeh, R. B., "Matrix computations and optimization in Apache Spark" ACM 31-38, 2016
22 Dean, J., "MapReduce: simplified data processing on large clusters" 51 : 107-113, 2008
23 Meng, X., "MLlib: Machine learning in apache spark" 17 : 1-7, 2016
24 Kraska, T., "MLbase: A distributed machine-learning system" 2013
25 H2O.ai, "H2O.ai - AI for Business"
26 Xin, R., "GraySort on Apache Spark by Databricks"
27 Xin, R. S., "GraphX: Unifying data-parallel and graph-parallel analytics"
28 Zaharia, M., "Discretized streams: Fault-tolerant streaming computation at scale" ACM 423-438, 2013
29 Kim, H., "DeepSpark: Spark-based deep learning supporting asynchronous updates and Caffe compatibility"
30 Spark Wiki, "Committers"
31 Lakshman, A., "Cassandra: a decentralized structured storage system" 44 : 35-40, 2010
32 Spark, "Apache spark"
33 Zeppelin, "Apache Zeppelin"
34 Vavilapalli, V. K., "Apache Hadoop YARN: Yet another resource negotiator" ACM 2013
35 HBase, "Apache HBase"
36 Lehoucq, R. B., "ARPACK users’ guide : solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, 6" SIAM 1998