python pandas converting string to date datatype

Table of Contents

  1. Converting string columns to date
  2. Passing a null value when date is invalid
  3. Passing invalid record when the record doesn’t have a valid date
  4. Video Tutorial
  1. Converting string columns to date

First, launch the Jupyterlab and open a Jupyter notebook. Import Pandas and the csv file. Use the head() function and give “ 3” as the argument in order to display the first 3 rows of the data frame as shown in figure 1.

Note that the year, month and date are separated by slashes

Figure 1: The data frame

In order to check the data types use dtypes function on the data frame and execute it. As you can see in figure 2, the Year column contains dates but they are given as object data type.

Figure 2:  The Year column values are given in object data type

In order to convert the object data type of the “Year” column to date data type to_datetime() function can be used.  Pass the column that is needed to converted in to date data type. Make sure to specify the data frame where the column is present as shown below,

car_sales.Year = pd.to_datetime(car_sales.Year)

This code basically means that the converted data type should be assigned instead of the previous one. This is shown in figure 3. Note that the year, month, and date are not separated by slashes as previously

Figure 3: Converting to date data type
  1. Then using the dtypes function display the data types of the data frame as shown in figure 4. It can be seen that now the data type of the Year column is datetime64[ns] instead of object.
Figure-4

Let’s assume that now we also have the timestamp where there was only a date. To demonstrate this, add timestamps to the original data file as shown in figure 5. Then re-import it and display the data types of each column. It can be observed that the data type of the Year column is an object as shown in figure 6.

Figure 5: Adding timestamps
Figure 6: Reimporting the data set

Then convert to DateTime data type by using the to_datetime() as we discussed before. Then display the data types using dtypes function. It can be seen that the data type of the Year column is converted into the DateTime data type as shown in figure 7.

Figure 7: Converting after adding the timestamp

2 Passing a null value when date is invalid

To demonstrate this scenario, let’s put the second cell of the Year columns’ month as 13. Which is an invalid date as there are only 12 months.  Then re-import the data as shown in figure 8. Because the data type of the Year column is still “object”, it doesn’t show that it’s invalid.

Figure 8: The invalid record

Then convert the data type to datetime data type and display the data set again. Then an error shows up saying that “month must be in 1…….12” as shown in figure 9.

Figure 9: The error

To get through this, the invalid record can be replaced by a null value. This can be done by passing an additional parameter which is the errors parameter. Set the value for the parameter as coerce as shown below,

errors= ‘coerce’

Add this parameter to the to_datetime function and execute as shown in figure 10.  Now the invalid record is shown as NaT. Also, note that the date type of the Year column is still DateTime data type.

Figure-10

3 Passing invalid record when record doesn’t have valid date

If we want to display the invalid record (2007/13/04), even if it’s wrong, the parameter errors should be set to as ignore,

errors= ‘ignore’

After executing the data frame, it can be observed that the invalid data is displayed. Then display the data types. Note that the datatype is displayed as an object, it no longer has the DateTime data type. This is shown in figure 11.

Figure 11: Displaying the invalid record

Video Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *