FlowingData Forums » Data Visualization

Highly variable data

Started 3 years ago by SoccerNut / 7 posts

  1. I have a data set with large range of values of interest (2-3 orders of magnitude). Let's say these values are next years forecasted expense for a line item in a budget. The distribution has a long, heavy tail.

    Yes, its awful that we can't have a more accurate forecast, but for now I need to show a non-technical audience something about the data to help them plan their budget.

    Is there anything better than a histogram with some interactive display of "x% likely the cost will be < $Y"?

  2. hmm, some sample data might be helpful here.

  3. OK, here's some data. Low and High are the bucket range, cumulative % is the % of population with $ amounts below the high end of that range.

    The average of this data set is $1,079, but this fails to convey any sense of the chance that the cost could be a lot more or less. Nearly 70% of people have cost less than $500.

    So far my best attempt is a plot of the cdf vs cost using a log scale for costs.

    Bucket, Low, High, Cumulative %, Average $ in the bucket
    1, $0 , $0 , 39.38%, $0
    2, $0.01 , $500 , 69.17%, $158
    3, $500 , $1,000 , 77.05%, $722
    4, $1,000 , $2,000 , 84.79%, $1,426
    5, $2,000 , $3,000 , 88.65%, $2,453
    6, $3,000 , $4,000 , 90.91%, $3,464
    7, $4,000 , $5,000 , 92.51%, $4,466
    8, $5,000 , $6,000 , 93.68%, $5,467
    9, $6,000 , $7,000 , 94.54%, $6,488
    10, $7,000 , $8,000 , 95.20%, $7,495
    11, $8,000 , $9,000 , 95.72%, $8,478
    12, $9,000 , $10,000 , 96.18%, $9,483
    13, $10,000 , $12,500 , 96.98%, $11,147
    14, $12,500 , $15,000 , 97.53%, $13,681
    15, $15,000 , $20,000 , 98.20%, $17,240
    16, $20,000 , $25,000 , 98.67%, $22,251
    17, $25,000 , $30,000 , 98.98%, $27,369
    18, $30,000 , $40,000 , 99.36%, $34,663
    19, $40,000 , $50,000 , 99.56%, $44,671
    20, $50,000 , $75,000 , 99.80%, $59,863
    21, $75,000 , $100,000 , 99.89%, $85,281
    22, $100,000 , $150,000 , 99.95%, $120,824
    23, $150,000 , $200,000 , 99.98%, $172,166
    24, $200,000 , $300,000 , 99.99%, $239,241
    25, $300,000 , $500,000 , 99.998%, $352,888
    26, $500,000 , $750,000 , 99.9994%, $582,382
    27, $750,000 , $10,000,000 , 100.00%, $966,700

  4. i am not sure what you are trying to show.

    perhaps it would help to have a little story, using the data above, what are the key things you wish to highlight? what insight do you want this graphic to deliver?

    The problem is trying to understand your data - you use terminology that requires some background understanding which I don't have.

  5. You wish to set money aside for self-insurance. To plan how much money to set aside you need to need to understand the possible costs and likelihood of those costs.

    The data above show that about 39% of the time your cost will be zero. This means you experienced no bad events, so had no expenses.

    69% of the time your cost will be less than $500. 91% of the time your cost will be less than $4,000 (the cumulative % is defined realtive to the high end point of the bucket).

    The chance of your cost being between $3000 and $4000 is 90.91-88.65=2.26%. If your cost is between $3000 and $4000, then its average is $3,464.

  6. assuming you want an interactive display.

    i would draw a block of 1000 little people icons in 20 rows of 50 silhouettes.

    then let the person select a threshold:

    eg: what are the chances of my costs being zero? (390 little people light up)

    eg: what are the chances of my costs being less than $500? (690 little people light up)

    eg: what are the chances of my costs exceeding $75,000 (2 little people light up)

    you could also provide some context giving the average amount and the exact % once a threshold is selected.

  7. I'd show the PDF, not the CDF.

    You could cite other probabilities next to each bin -- from "getting two of a kind on your first deal in poker" to Dying in a Streetcar Accident


Reply

You must log in to post.

About this Topic

Tags

No tags yet.