Import modules
from datetime import datetime import pandas as pd
import matplotlib.pyplot as pyplot
Consider the following data points:
date |
tick_numbers |
|
2016-05-01 10:23:05.069722 |
3213 |
|
2016-05-01 10:23:05.119994 |
4324 |
|
2016-05-02 10:23:05.178768 |
2132 |
|
2016-05-02 10:23:05.230071 |
43242 |
|
2016-05-02 10:23:05.230071 |
4234 |
|
2016-05-02 10:23:05.280592 |
4234 |
|
2016-05-03 10:23:05.332662 |
4324 |
|
2016-05-03 10:23:05.385109 |
1245 |
|
2016-05-04 10:23:05.436523 |
1555 |
|
2016-05-04 10:23:05.486877 |
543345 |
|
Create a dataframe ‘ts’ |
||
ts= |
||
print ts |
||
date tick_numbers |
||
0 2016-05-01 10:23:05.069722 |
3213 |
|
1 2016-05-01 10:23:05.119994 |
4324 |
|
2 2016-05-02 10:23:05.178768 |
2132 |
|
3 2016-05-02 10:23:05.230071 |
43242 |
|
4 2016-05-02 10:23:05.230071 |
4234 |
|
5 2016-05-02 10:23:05.280592 |
4234 |
|
6 2016-05-03 10:23:05.332662 |
4324 |
|
7 2016-05-03 10:23:05.385109 |
1245 |
|
8 2016-05-04 10:23:05.436523 |
1555 |
|
9 2016-05-04 10:23:05.486877 |
543345 |
Convert ts['date'] from string to datetime. You can use ts.index.
ts.index=
Delete useless column with the command del
del
print ts
In [17]: print ts |
|
tick_numbers |
|
date |
|
2016-05-01 10:23:05.069722 |
3213 |
2016-05-01 10:23:05.119994 |
4324 |
2016-05-02 10:23:05.178768 |
2132 |
2016-05-02 10:23:05.230071 |
43242 |
2016-05-02 10:23:05.230071 |
4234 |
2016-05-02 10:23:05.280592 |
4234 |
2016-05-03 10:23:05.332662 |
4324 |
2016-05-03 10:23:05.385109 |
1245 |
2016-05-04 10:23:05.436523 |
1555 |
2016-05-04 10:23:05.486877 |
543345 |
Print all data from 2016
Print all data from May 2016
Data after May 3rd, 2016
Remove all the data after May 2nd, 2016 using truncate
Count the number of data per timestamp
Mean value of ticks per day. You will use resample with a period of D and a method of mean.
Total value ticks per day. You will use sum and a period of D
Plot of the total of ticks per day
Create another dataframe
np.random.seed(12345)
0 create a dictionary
0 df[‘ARCA’] = store np.random.randint(low=20000, high=30000, size=62)
0 df[‘BARX’] = store np.random.randint(low=20000, high=30000, size=62)
0 index = pd.date_range('4/1/2012', '6/1/2012')
0 create the dataframe with the 3 components above
Print (df)
pd.DataFrame(volume,index=index).head() Out[90]:
ARCA BARX 2012-04-01 24578 28633 2012-04-02 22177 26542 2012-04-03 23492 26554 2012-04-04 24094 21707 2012-04-05 24478 25568
Truncate the dataframe to get data (before='2012-04-04',after='2012-05-24')
Change the offset of the dataframe by pd.DateOffset(months=1, days=1)
Shift the dataframe by 1 day
Lag a variable 1 day
Aggregate into 2W-SUN (bi-weekly starting by Sunday) by summing up the value of each daily volumw
Aggregate into weeks by averaging up the value of each daily volume