Python For Data Science Cheat Sheet NumPy Basics Learn Python for Data Science Interactively at www.DataCamp.com NumPy DataCamp Learn Python for Data Science Interactively The NumPy library is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. Python ML Cheat Sheet Share. Python Data Science Environment Setup. Data Preprocessing, Analysis & Visualization- ML. Python Reinforcement Learning- AI: 5. January 11th, 2018 A cheat sheet that covers several ways of getting data into Python: from flat files such as.txts and.csv to files native to other software, such as Excel, SAS, or Matlab, and relational databases such as SQLite & PostgreSQL.
My little collection of Python recipes for data science featuring Pandas, Matplotlib, and friends.
Pandas: reading a CSV from a string
In Pandas we can load data from a CSV file with read_csv
:
Now, it's not uncommon to have some tabular data as a string:
To load this string as a file we can use Python built-in StringIO
:

Credits to my friend Ernesto for this tip. Java 6 for mac os x download.
How to plot a CSV with Pandas
Consider this file-like CSV:

To plot this CSV with Pandas we call the plot
method on the DataFrame:
To show the plot instead we call show
on plt
:
You can also save the plot with savefig
:
Now, you'll notice that the resulting picture has indeed two labels taken from the CSV column. But the x axis is associated with the indexes of each DataFrame row:
A DataFrame in fact has indexes:
Python For Data Science Cheat Sheet Pandas

To use the year column instead of an index for the x axis we can instruct plot
respectively with the x and y arguments (in this example you can omit y):
Now the plot is coherent with the dataset:
How to groupby in Pandas
Python For Data Science Cheat Sheet Pandas
Suppose you've got a CSV with two columns, year and amount:
To compute the amount by year you can group by year and then call sum
:
This gives you a new DataFrame as expected:
