Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I find the date for which a model from family = poisson(link = "log") predicts a specified value?

I have data that consists of values on dates, here represented as timestamps:

json <- structure(list(creation_date = c(1689356840L, 1689366017L, 1689376446L, 
1689504809L, 1690366268L, 1690472012L, 1690478723L, 1690701972L, 
1690701973L, 1690746389L, 1690746392L, 1690831235L, 1691022236L, 
1692097787L, 1692625463L, 1692699453L, 1692828527L, 1692828527L, 
1692828550L, 1692829734L, 1692878775L, 1692879861L, 1693058927L, 
1693082719L, 1693106370L, 1693115869L, 1693120632L, 1693121481L, 
1693137142L, 1693164782L, 1693172783L, 1693175185L, 1693208767L, 
1693909005L, 1693914013L, 1693921364L, 1693928549L, 1693930560L, 
1693999563L, 1694001298L, 1694003421L, 1694003758L, 1694005624L, 
1694015620L, 1694015743L, 1694074163L, 1694153769L, 1694181062L, 
1694194729L, 1694194740L, 1694194740L, 1694194744L, 1694209208L, 
1694594207L, 1694798845L, 1694798846L, 1694809497L, 1694883734L, 
1694901062L, 1695124304L, 1695221997L, 1695222830L, 1695259750L, 
1695259766L, 1695260229L, 1695283945L, 1695329985L, 1695330735L, 
1695363048L, 1695399875L, 1695403486L, 1695409526L, 1695458043L, 
1695464399L, 1695469820L, 1695472988L, 1695479007L, 1695483407L, 
1695485270L, 1695485763L, 1695485763L, 1695493324L, 1695505234L, 
1695505235L, 1695509011L, 1695517370L, 1695527643L, 1695556326L, 
1695567795L, 1695567798L, 1695575998L, 1695576929L, 1695589633L, 
1695596154L, 1695607375L, 1695633830L, 1695657647L, 1695658242L, 
1695710762L, 1695719101L), cumulative_reputation = c(100L, 110L, 
120L, 130L, 140L, 150L, 160L, 170L, 185L, 200L, 185L, 195L, 205L, 
215L, 213L, 228L, 230L, 240L, 230L, 240L, 250L, 252L, 262L, 272L, 
282L, 292L, 302L, 312L, 322L, 332L, 342L, 352L, 362L, 372L, 382L, 
392L, 402L, 412L, 422L, 432L, 442L, 452L, 462L, 472L, 482L, 492L, 
502L, 512L, 522L, 512L, 510L, 512L, 522L, 532L, 547L, 557L, 567L, 
577L, 587L, 597L, 607L, 617L, 632L, 617L, 627L, 637L, 647L, 657L, 
672L, 682L, 692L, 707L, 717L, 727L, 737L, 747L, 745L, 755L, 757L, 
747L, 745L, 755L, 753L, 755L, 765L, 775L, 785L, 795L, 805L, 795L, 
805L, 815L, 825L, 835L, 845L, 847L, 857L, 867L, 877L, 887L)), row.names = c(670L, 
669L, 668L, 667L, 666L, 665L, 664L, 663L, 662L, 661L, 660L, 659L, 
658L, 657L, 656L, 655L, 653L, 654L, 652L, 651L, 650L, 649L, 648L, 
647L, 646L, 645L, 644L, 643L, 642L, 641L, 640L, 639L, 638L, 637L, 
636L, 635L, 634L, 633L, 632L, 631L, 630L, 629L, 628L, 627L, 626L, 
625L, 624L, 623L, 622L, 620L, 621L, 619L, 618L, 617L, 616L, 615L, 
614L, 613L, 612L, 611L, 610L, 609L, 608L, 607L, 606L, 605L, 604L, 
603L, 602L, 601L, 600L, 599L, 598L, 597L, 596L, 595L, 594L, 593L, 
592L, 590L, 591L, 589L, 588L, 587L, 586L, 585L, 584L, 583L, 582L, 
581L, 580L, 579L, 578L, 577L, 576L, 575L, 574L, 573L, 572L, 571L
), class = "data.frame")

I have fit a model to that data and can now calculate the predicted values:

model <- glm(cumulative_reputation ~ creation_date, data = json, family = poisson(link = "log"))
prediction <- predict(model, json)

plot of the measured and the predicted values

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How can I find the date (or timestamp) for which the model predicts a specific value?

To find that date, I have calculated values from the last date of measurement (1695719101) to a year into the future (1695719101 + 365 * 24 * 60 * 60) and then searched the resulting values for the one closest to the one I want (cumulative_reputation = 10000).

Is there a way to calculate this date (or timestamp) directly?


Note:

This question is a follow up to this question.

>Solution :

The model gives you an equation for the curve, which you can then use for root-finding against a specific value. For example:

## This is essentially what 'predict(model, type="response")' does here
COEF <- coef(model)
pred <- function(t) exp(COEF[1] + t*COEF[2])

## Find the root for a certain target response
optfn <- function(t, target=10000) pred(t) - target
uniroot(optfn, c(1E9, 5E9))$root |> as.POSIXct(tz="UTC")

#> "2023-12-20 12:07:04 UTC"

Just to confirm that this timestamp is indeed the solution:

predict(model, list(creation_date=1703074025), type="response")
#> 10000

Speaking briefly for the statistical aspect, extrapolating so far beyond the observed data might be questionable.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading