Convert “iris” to a dataframe
If you are a beginner in Python, a very good way to start your journey is to do some data analysis on a dataset.
One such famous dataset is ‘iris’ dataset.
According to scikit learn:
“This data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray
The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width.”
For more information on this dataset, you can visit the official dataset site: scikit-learn - iris
Now coming to the core question: How to convert this dataset to a dataframe?
Initially, I searched a lot about this on Google but the results were not so easy to comprehend for a beginner and in some other cases the dataset was loaded via a csv file.
You don’t need a csv file of “iris” dataset. It can be very easily loaded with a few lines of code.
import pandas as pd
from sklearn import datasets
iris=datasets.load_iris()
Now we have the iris dataset in “iris” variable. We can load it just by typing iris and running the code. The output we get will look like this:
Upon scrolling to the bottom, we will also get the following information about the dataset:
Needless to say that we don’t need it in our final analysis dataset but we can find the column names here:
iris.data will come to our rescue and give us a clean dataset:
And the bottom of output now looks like:
Now we will be coming to the final part of converting this array to dataframe but before this we need to have a look at the docstring of dataframe to know its format. Here it is:
Alert: In dataframe, d and f are in uppercase (DataFrame)
For column names (which we got earlier from the dataset info), we create a different list.
colname=[‘sepal length (cm)’, ‘sepal width (cm)’, ‘petal length (cm)’, ‘petal width (cm)’]
And the heart of the post:
data1=pd.DataFrame(data=iris1,columns=colname)
This is my first Medium post. Hopeful that it will be useful to many beginners.