Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

gnuplot: How do I extract a date from a string to use it with timecolumn?

In some of my files, date/time is encoded in a character string, e.g. "subject_name/20221006085330_test_something". The following function returns the date (in this case "20221006"):

get_date(c) = substr( strcol(c), strstrt( strcol(c), "/" ) + 1, strstrt( strcol(c), "/" ) + 8 )

How can this function be used together with

stats $DAT i selected_block(1) u ( timecolumn( 2, timeFmt ) ) 

Replacing 2 with get_date("string") does not work, and I cannot explain the meaning of this error: unknown type in magnitude().

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# Test script:
reset session

$DAT<<EOD
# cross
string    date     
subject001_name/20200203144309_session021_id     20200203
subject002_name/20200522103745_session079_id     20200522
subject003_name/20210521172330_session201_id     20210521
subject004_name/20220527130250_session047_id     20220527
subject005_name/20220701171757_session081_id     20220701
subject006_name/20220706102238_session003_id     20220706
subject007_name/20221006110417_session081_id     20221006
subject008_name/20230309144707_session901_id     20230309
subject009_name/20230907080602_session003_id     20230907
subject010_name/20240919143733_session022_id     20240919 
EOD

get_date(c) = substr( strcol(c), strstrt( strcol(c), "/" ) + 1, strstrt( strcol(c), "/" ) + 8 )

blocks              = "cross"
selected_block(i)   = word( blocks, i )

timeFmt             = "%Y%m%d"

stats $DAT i selected_block(1) u ( timecolumn( 2, timeFmt ) ) 

pause -1 

stats $DAT i selected_block(1) u ( timecolumn( get_date( "string" ), timeFmt ) ) 

Edit: The following adapted script works, but the statistics show 11 instead of 10 records.

# Test script:
reset session

$DAT<<EOD
# cross
string    date     
subject001_name/20200203144309_session021_id     20200203 
subject002_name/20200522103745_session079_id     20200522 
subject003_name/20210521172330_session201_id     20210521 
subject004_name/20220527130250_session047_id     20220527 
subject005_name/20220701171757_session081_id     20220701 
subject006_name/20220706102238_session003_id     20220706 
subject007_name/20221006110417_session081_id     20221006 
subject008_name/20230309144707_session901_id     20230309 
subject009_name/20230907080602_session003_id     20230907 
subject010_name/20240919143733_session022_id     20240919 
EOD

get_date(c) = substr( strcol(c), strstrt( strcol(c), "/" ) + 1, strstrt( strcol(c), "/" ) + 8 )

blocks              = "cross"
selected_block(i)   = word( blocks, i )

timeFmt             = "%Y%m%d"
date_column(col)    = strptime( timeFmt, get_date(col) )

stats $DAT i selected_block(1) u ( date_column("string") ) 

>Solution :

Well, the short answer would be: use strptime() instead.

The function timecolumn(col,fmt) is explictely for the column number and the date/time format. You cannot insert a function to extract the date/time first. Use strptime() instead (check help strptime).

Script:

### extract date/time from a string for stats
reset session

$Data <<EOD
subject001_name/20221006085330_test_something    1   2   3
subject002_name/20231007090000_test_something    4   5   6
subject003_name/20241007101111_test_something    7   8   9
EOD

myFmt             = "%Y%m%d%H%M%S"
get_datetime(col) = substr(_s=strcol(col), _t=strstrt(_s,"/")+1, _t+13)
t(col)            = strptime(myFmt, get_datetime(col))

set table $DateTime
    plot $Data u (get_datetime(1)) w table
unset table
print $DateTime

stats $Data u (t(1)) nooutput
print "Min: ", STATS_min, "  =  ", strftime("%Y-%m-%d %H:%M:%S",STATS_min)
print "Max: ", STATS_max, "  =  ", strftime("%Y-%m-%d %H:%M:%S",STATS_max)
### end of script

Result:

20221006085330
20231007090000
20241007101111

Min: 1665046410.0  =  2022-10-06 08:53:30
Max: 1728295871.0  =  2024-10-07 10:11:11

Addition 1:

If your substring before the date was constant (which is not in your case) you could have used timecolumn() in the following way.

myFmt = "subject_name/%Y%m%d%H%M%S"
stats $Data u (timecolumn(1,myFmt)) nooutput

Addition 2:

Yet, another method using timecolumn() with variable pre-string. Before stats set the datafile separator to / and consider column 2 as the date/time column. However, this again implies that you have the same number of slashes / before the date.

set datafile separator "/"
myFmt = "%Y%m%d%H%M%S"
stats $Data u (timecolumn(2,myFmt)) nooutput
set datafile separator whitespace
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading