Turning a Python dictionary into a pandas DataFrame

One hard part of learning things online is that teaching materials, and students (including me!), are often imperfect. In a DataCamp exercise, I misread some instructions about turning a Python dictionary into a DataFrame, and I went down a rabbit hole as a result. This is what I learned.

Dictionaries to DataFrames

The easy way

The structure of your dictionary determines how hard or easy it is to make it into a pandas DataFrame. If your dictionary resembles a DataFrame, by having keys that correspond to column labels and values that correspond to a list of values in that column, the conversion is easy. Say you have this:

my_dict = {"years": [1999, 2000, 2001], 
           "num_friends": [5, 3, 0]}

Making this into a DataFrame is trivial (assume pandas is already imported as pd):

my_df = pd.DataFrame(my_dict) # no method needed

The resulting DataFrame my_df looks like this:

   years  num_friends
 1999            5
 2000            3
 2001            0

The hard way

If your dictionary does not look like a DataFrame, it’s trickier to get it into this kind of a shape. Suppose your dictionary has basically the same information as the previous one, but it just maps years to numbers of friends, like so:

my_dict = {"1999": 5, "2000": 3, "2001":  0}

If you dump this into a DataFrame—which requires passing it as a list, otherwise you get an error about using all scalar values (see explanation at Statology)—you get the keys as columns. So from this:

my_df = pd.DataFrame([my_dict]) # note brackets

my_df now looks like this:

   1999  2000  2001
0     5     3     0

If you were trying to get the years and friend numbers in their own columns, you might try to use the DataFrame.from_dict() method. And you can get columnwise orientation by doing this:

my_df_cols = pd.DataFrame.from_dict(
    my_dict, 
    orient = "index", # this flips the orientation
    columns = ["num_friends"] # this name is arbitrary
)

The result, my_df_cols, looks like this:

      num_friends
          5
          3
          0

This is better, but we want the years in their own column, not serving as row names. To do this, instead of passing in my_dict, you pass in its items(), like so (approach inferred from StackOverflow), without using from_dict():

my_df_cols = pd.DataFrame(
    my_dict.items(), # pass in keys, values as items
    columns = ["year", "num_friends"]
)

The resulting DataFrame my_df_cols looks exactly like what we got from the first dictionary, in which the keys were column labels:

   year  num_friends
1999            5
2000            3
2001            0

For now, that’s what I know!