Python For Data Science Cheat Sheet



Python For Data Science Cheat Sheet NumPy Basics Learn Python for Data Science Interactively at www.DataCamp.com NumPy DataCamp Learn Python for Data Science Interactively The NumPy library is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. Python ML Cheat Sheet Share. Python Data Science Environment Setup. Data Preprocessing, Analysis & Visualization- ML. Python Reinforcement Learning- AI: 5. January 11th, 2018 A cheat sheet that covers several ways of getting data into Python: from flat files such as.txts and.csv to files native to other software, such as Excel, SAS, or Matlab, and relational databases such as SQLite & PostgreSQL.

My little collection of Python recipes for data science featuring Pandas, Matplotlib, and friends.

Pandas: reading a CSV from a string

In Pandas we can load data from a CSV file with read_csv:

Now, it's not uncommon to have some tabular data as a string:

To load this string as a file we can use Python built-in StringIO:

Python For Data Science Cheat Sheet

Credits to my friend Ernesto for this tip. Java 6 for mac os x download.

How to plot a CSV with Pandas

Consider this file-like CSV:

Sheet

To plot this CSV with Pandas we call the plot method on the DataFrame:

To show the plot instead we call show on plt:

You can also save the plot with savefig:

Now, you'll notice that the resulting picture has indeed two labels taken from the CSV column. But the x axis is associated with the indexes of each DataFrame row:

A DataFrame in fact has indexes:

Python For Data Science Cheat Sheet Pandas

Data

To use the year column instead of an index for the x axis we can instruct plot respectively with the x and y arguments (in this example you can omit y):

Now the plot is coherent with the dataset:

How to groupby in Pandas

Python For Data Science Cheat Sheet Pandas

Suppose you've got a CSV with two columns, year and amount:

To compute the amount by year you can group by year and then call sum:

This gives you a new DataFrame as expected: