Picture by Writer | Midjourney
Â
Time-based knowledge could be distinctive after we face completely different time-zones. Nonetheless, decoding timestamps could be laborious due to these variations. This information will make it easier to handle time zones and timestamps with the Pandas library in Python.
Â
Preparation
Â
On this tutorial, we’ll use the Pandas package deal. We will set up the package deal utilizing the next code.
Â
Now, we’ll discover how one can work with time-based knowledge in Pandas with sensible examples.
Â
Dealing with Time Zones and Timestamps with Pandas
Â
Time knowledge is a singular dataset that gives a time-specific reference for occasions. Probably the most correct time knowledge is the timestamp, which accommodates detailed details about time from 12 months to millisecond.
Let’s begin by making a pattern dataset.
import pandas as pd
knowledge = {
'transaction_id': [1, 2, 3],
'timestamp': ['2023-06-15 12:00:05', '2024-04-15 15:20:02', '2024-06-15 21:17:43'],
'quantity': [100, 200, 150]
}
df = pd.DataFrame(knowledge)
df['timestamp'] = pd.to_datetime(df['timestamp'])
Â
The ‘timestamp’ column within the instance above accommodates time knowledge with second-level precision. To transform this column to a datetime format, we should always use the pd.to_datetime
operate.”
Afterward, we will make the datetime knowledge timezone-aware. For instance, we will convert the information to Coordinated Common Time (UTC)
df['timestamp_utc'] = df['timestamp'].dt.tz_localize('UTC')
print(df)
Â
Output>>
transaction_id timestamp quantity timestamp_utc
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
Â
The ‘timestamp_utc’ values include a lot data, together with the time-zone. We will convert the present time-zone to a different one. For instance, I used the UTC column and altered it to the Japan Timezone.
df['timestamp_japan'] = df['timestamp_utc'].dt.tz_convert('Asia/Tokyo')
print(df)
Â
Output>>>
transaction_id timestamp quantity timestamp_utc
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
timestamp_japan
0 2023-06-15 21:00:05+09:00
1 2024-04-16 00:20:02+09:00
2 2024-06-16 06:17:43+09:00
Â
We might filter the information in accordance with a specific time-zone with this new time-zone. For instance, we will filter the information utilizing Japan time.
start_time_japan = pd.Timestamp('2024-06-15 06:00:00', tz='Asia/Tokyo')
end_time_japan = pd.Timestamp('2024-06-16 07:59:59', tz='Asia/Tokyo')
filtered_df = df[(df['timestamp_japan'] >= start_time_japan) & (df['timestamp_japan'] <= end_time_japan)]
print(filtered_df)
Â
Output>>>
transaction_id timestamp quantity timestamp_utc
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
timestamp_japan
2 2024-06-16 06:17:43+09:00
Â
Working with time-series knowledge would permit us to carry out time-series resampling. Let us take a look at an instance of information resampling hourly for every column in our dataset.
resampled_df = df.set_index('timestamp_japan').resample('H').rely()
Â
Leverage Pandas’ time-zone knowledge and timestamps to take full benefit of its options.
Â
Further Assets
Â
Â
Â
Cornellius Yudha Wijaya is an information science assistant supervisor and knowledge author. Whereas working full-time at Allianz Indonesia, he likes to share Python and knowledge suggestions by way of social media and writing media. Cornellius writes on a wide range of AI and machine studying subjects.