December 16, 2019

USEFUL PYTHON FOR DATA SCIENCE

In the two months since starting my job as a Production Engineer in the data science group at Natera, I’ve learned that having modular and reusable code is incredibly useful for when unexpected analyses and crises arise. Until recently, I would search through my growing number of Jupyter notebooks until I found code similar to what I was looking for and copy it over. Instead of opening five or six notebooks at a time, I decided to consolidate all of the common operations I perform on a daily basis into one notebook. And, to make it as widely applicable as possible, I decided to use the open source Pokémon dataset from Kaggle.

I’ve put the notebook into CoLab, Google’s platform for shareable Jupyter notebooks. If you’d like to copy it for yourself and run it, the data I used can be found below.

Link to Notebook

built and baked by charlotte merzbacher