Pandas and Big Data #7

synapticarbors · 2016-10-27T14:43:07Z

At the most recent meetup there was some interest about learning how to do "Big Data" with pandas. For the purpose of starting the discussion, I'll frame that as analysis/manipulation of data that is larger than can easily fit in-memory on your laptop/workstation. There are a number of tools out there to do this. The one I'm most familiar with is Dask (http://dask.pydata.org/).

Anyone interested in planning for a talk/tutorial in 2017?

cc/ @AlbertDeFusco @annafil

AlbertDeFusco · 2016-10-27T14:54:35Z

Sure, I'll get involved. Dask is a really great tool for getting started with larger-than-memory data.

Other things than can be discussed in the spectrum of tall data to big data:

pyspark and its SQL context for easy mapping to Pandas. Works on Hive, too.
Database access with PyODBC and SQLAlchemy
Blaze expression translation for back-end agnostic data access

robert-lucente · 2016-10-27T21:45:29Z

It is interesting how Dask has a "task graph". This concept of a task graph shows up over and over. Terraform is another example that I recently ran into (github.com/hashicorp/terraform)

infinite-tape added the topic label Oct 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas and Big Data #7

Pandas and Big Data #7

synapticarbors commented Oct 27, 2016

AlbertDeFusco commented Oct 27, 2016

robert-lucente commented Oct 27, 2016

Pandas and Big Data #7

Pandas and Big Data #7

Comments

synapticarbors commented Oct 27, 2016

AlbertDeFusco commented Oct 27, 2016

robert-lucente commented Oct 27, 2016