Convert “iris” to a dataframe

Vaibhav Kabdwal
3 min readOct 29, 2020

If you are a beginner in Python, a very good way to start your journey is to do some data analysis on a dataset.

One such famous dataset is ‘iris’ dataset.

Iris Flower (https://www.proflowers.com/blog/history-and-meaning-of-iris)

According to scikit learn:

“This data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray

The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width.”

For more information on this dataset, you can visit the official dataset site: scikit-learn - iris

Now coming to the core question: How to convert this dataset to a dataframe?

Initially, I searched a lot about this on Google but the results were not so easy to comprehend for a beginner and in some other cases the dataset was loaded via a csv file.

You don’t need a csv file of “iris” dataset. It can be very easily loaded with a few lines of code.

import pandas as pd
from sklearn import datasets
iris=datasets.load_iris()

Now we have the iris dataset in “iris” variable. We can load it just by typing iris and running the code. The output we get will look like this:

Output

Upon scrolling to the bottom, we will also get the following information about the dataset:

Output (Upon scrolling down)

Needless to say that we don’t need it in our final analysis dataset but we can find the column names here:

At the end of output

iris.data will come to our rescue and give us a clean dataset:

And the bottom of output now looks like:

Now we will be coming to the final part of converting this array to dataframe but before this we need to have a look at the docstring of dataframe to know its format. Here it is:

DataFrame inputs

Alert: In dataframe, d and f are in uppercase (DataFrame)

For column names (which we got earlier from the dataset info), we create a different list.

colname=[‘sepal length (cm)’, ‘sepal width (cm)’, ‘petal length (cm)’, ‘petal width (cm)’]

And the heart of the post:

data1=pd.DataFrame(data=iris1,columns=colname)

Final Output

This is my first Medium post. Hopeful that it will be useful to many beginners.

--

--

Vaibhav Kabdwal

Interested in Quantitative Finance, Machine Learning, Macro Economics, Trading and Cosmology. CFA Level 2 Cleared, MBA Finance. Working in Credit Risk Analytics