Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: About taking log on zero values |

Date |
Thu, 20 Feb 2014 07:18:40 -0500 |

Sebastian Say <sebastian.statalist@gmail.com> Sorry, that should be addressed to Sebastian Say, Alfonso, Jeph, Maarten, and Nick (not just Alfonso): On Thu, Feb 20, 2014 at 7:16 AM, Austin Nichols <austinnichols@gmail.com> wrote: > Alfonso: > > Whether sales=0 means > "literally nothing" or "so small that it could not be detected" > you can't do any of the things suggested without introducing bias. > In the former case, you must run separate models for cases with and > without sales (or fully interact by a dummy variable nosales) while in > the latter case you must multiply impute sales using a sensible model, > not simply add a constant. > > > On Thu, Feb 20, 2014 at 4:57 AM, Maarten Buis <maartenlbuis@gmail.com> wrote: >> One option you could also consider is that you treat the value 0 as >> special which needs its own effect. This depends whether 0 means >> "literaly nothing" or "so small that it could not be detected". In the >> former case you would often want to treat the value 0 as qualitatively >> different, while in the later case adding a small but not too small >> number to the 0 values could be justified. >> >> In case that you would want to treat the value 0 as qualititively >> different, then I would do something like this: >> >> gen byte nosales = (sales == 0) if sales < . >> gen logsales = ln(sales) >> sum logsales, meanonly >> replace logsales = r(min) if nosales == 1 >> reg y x1 x2 logsales nosales >> >> In that case the coefficient for logsales can be interpreted as >> before, but refers only to sales > 0. The coefficient for nosales >> represents the difference in expected value of y between those units >> with no sales at all and those units with the smallest non-zero sales. >> >> Hope this helps, >> Maarten >> >> >> On Wed, Feb 19, 2014 at 9:11 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>> Stata would ignore numeric missings in anything like a regression calculation. >>> >>> That applies also to missings that result from calculating log(0). >>> >>> Changing values of 0 to values to 1 so that you can take logarithms is >>> not something I would call "usual practice". It is, I suspect, >>> regarded differently by different people on a spectrum from unethical >>> and incorrect to an acceptable fudge, depending partly on the rest of >>> the data and what you are doing with them. >>> >>> An incomplete list of things to think about: >>> >>> 0. If values of 1 occur otherwise, you have created an inconsistency. >>> If values between 0 and 1 occur otherwise, you have created a bigger >>> one. Applying log(x + 1) consistently solves this problem only by >>> creating another. Applying log(x + 1) and pretending that it is really >>> applying log(x) is not widely accepted. >>> >>> 1. If 0 really means what it says, changing it to 1 is a >>> falsification. Whether you can put a spin on it as an acceptable or >>> necessary falsification is an open question. >>> >>> 2. If 0 really means "small but not detected", changing it to e.g. >>> half smallest observable value is sometimes an accepted or acceptable >>> modification. >>> >>> 3. Replacing log(0) with log(1) is not, necessarily, even a small and >>> conservative modification. If apart from the values of 0 values range >>> from e3 to e6 then after logging you have 0 and otherwise a range of 3 >>> to 6. You may have _created_ a bundle of outliers that will dominate >>> analyses. >>> >>> 4. Doing something about 0s is only necessary with logarithmic >>> transformation. If you have 0s in the response, you can leave them and >>> use a logarithmic link. That won't necessarily be a good model, but >>> using a logarithmic link doesn't require positive values in the >>> response, only that the mean function be always positive. (This >>> doesn't apply in your case as the variable in question is a >>> predictor.) >>> >>> 5. There are usually alternatives, such as transformations other than >>> logarithms. >>> >>> 6. I wouldn't do anything without considering some kind of sensitivity >>> analysis, i.e. a consideration of how much difference an arbitrary >>> treatment of zeros makes. >>> >>> 7. There is often an argument that implies that the observations with >>> zeros don't belong any way. >>> >>> (I have generalised your question, but suspect that zero values for >>> sales usually mean exactly what they say.) >>> >>> Nick >>> njcoxstata@gmail.com >>> >>> On 19 February 2014 19:44, Sebastian Say >>> <sebastian.statalist@gmail.com> wrote [edited] >>> >>>> My question is about how Stata treats a log-transformed variable >>>> that draws upon an original variable that contains zero. >>>> >>>> In my dataset, I have firm sales data but some of them have values of zero. I >>>> created a logsales variable and noticed that those with zeros are >>>> indicated as a "." >>>> >>>> I plan to run a regression, e.g. >>>> >>>> reg y x1 x2 logsales >>>> >>>> My question is, how would Stata treat these "." if I do not remove them? >>>> >>>> Technically the "." should be undefined. >>>> >>>> I've read some papers and they usually put a 1 for those sales data >>>> with zeros in them. Is this a usual practice? >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> >> >> >> -- >> --------------------------------- >> Maarten L. Buis >> WZB >> Reichpietschufer 50 >> 10785 Berlin >> Germany >> >> http://www.maartenbuis.nl >> --------------------------------- >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: About taking log on zero values***From:*Sebastian Say <sebastian.statalist@gmail.com>

**Re: st: About taking log on zero values***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: About taking log on zero values***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: About taking log on zero values***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: About taking log on zero values** - Next by Date:
**Re: st: About taking log on zero values** - Previous by thread:
**Re: st: About taking log on zero values** - Next by thread:
**Re: st: About taking log on zero values** - Index(es):