Spark applies scatter gather pattern in distributed programming similar to map reduce
In spark data model is in the form of RDD (Resilient distributed datasource)
Spark is aganostic of storage model (Can works on/on top of HDFS, Cassandra etc...)
Transformation functions turn one RDD to another and action functions will be used to reduce
It is kind of batch mode processing frameworks, But spark also has streaming fashion called spark-streaming
- Can be larger than a single computer
- Read from an input source
- Can be output from a pure function
- Immutable
- Typed
- Ordered
- Lazily Evaluated
- Partitioned
- Collection of things