predicting data using date range method in python pandas

Table of Contents

  1. Introduction
  2. Predicting values for missing dates
  3. More about date_range function
  4. Video Tutorial

1. Introduction

In the previous session, we discussed how to predict missing values in the records. In this session, we are going to discuss predicting values for the missing dates.

Refer the figure 1. It shows the data set but there are no records for some dates. As an example see the row 9 and 10 dates. The records for 9,10,11,12 are missing. Hence, in this session, we are going to predict the values for those missing days.

Figure 1: Data set with missing dates

2 Predicting values for missing dates

First import Pandas and Numpy libraries and the data set into Jupyter notebook as shown in figure 2.  Note that the dteday column is not linear, as some date values and their records are missing. Also, some records are missing as well (represented as NaN).

Figure 2: Importing the data set

We know that there are missing dates. So we can use the date_range() function to get all the dates within a range.  Pass the starting date and the ending date into the date_range() function as shown below. Since we need to get all the dates between January 1st to January 31st, we need to give the starting date as 01-01-2011 and the ending date as 01-31-2011. Then let’s assign it to a variable named date_ range.

date_range = pd.date_range(’01 – 01- 2011’, ‘01- 31- 2011’)

Execute the date_range variable. The output will be a list of dates as shown in the figure 3.

Figure 3: Using the date_range function to get the all dates

Then we need to append the date_range variable as the date column of our bike_sharing data frame. To do this, the reindex() function in Pandas can be used. The column labels and row labels of a data frame will be changed by the reindex() function. It will conform the date values with the matching record. But we need to specify that the missing records should be represented as NaN, by using the fill_valueparameter.

We need to override the existing data frame by the new values which were obtained after re-indexing. The code is shown in figure 4.

Figure 4:Code for re-indexing and overriding the data frame

Execute the bike_sharing data frame and observe the updated table in figure 5. Note that the index column contains all the dates within the given range and the missing values are represented as NaN as we specified.

Figure 5:  Updated table containing all the dates

Now let’s use the linear interpolation to predict the missing records. To do this, use the interpolate() method and specify the method parameter as linear. Use the around() function in Numpy library to round the approximated values as shown in figure 6.

Figure 6: Using linear interpolation to predict values

Execute the code and observe the output. It can be seen values for the missing records are predicted and rounded as shown in figure 7.

Figure-7

3. More about date_range function

a. Start and End parameters

We have seen that start and end date don’t have to be explicitly specified. But, if we want to, we can explicitly specify these two values by using: start and end parameters. This is shown in figure 8. Set the freq parameter as ‘D’ which is discussed in next point

Figure 8: Specifying starting and ending dates using parameters

b. The freq and periods parameters

If we want to obtain the dates according to a specific frequency, the freq parameter can be used.

The periods parameter can be used to specify the number of data that should be displayed. Refer the figure 9. The period is given as 10, along with the start date. Hence, it shows only 10 dates starting from 2011 – 01- 01

Figure-9

Let’s look at some possible values to pass into the freq parameter

D – The default frequency

This value outputs the dates as shown in the calendar. Observe the figure 10.

Figure 10: The default frequency

B – The business days

This displays only the business days, as shown in the figure 101

Figure-11

M – The month end frequency (figure 12)

Figure 12: Month end frequency

Video Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *