I’ve previously written about a bi-annual seasonality pattern in US equity markets: https://rrspstrategy.wordpress.com/2014/05/16/bi-annual-seasonality/

The quarterly average market (Mkt-RF) returns from 1950 to present are shown below (data from Ken French’s library). Quarters 1-4 are even years and 5-8 are odd years.

The table shows that mean returns of quarters 4-6 are greater than zero with high significance (t-stat > 2.3).

Except for Q8 which is marginal, all other quarterly means (including negative values) are not statistically different from zero (t-stat < 2). Therefore it is not possible to profit from this effect by excluding negative periods, hence the ‘partial’ debunking.

Caveats to these test results are that the dataset is small (32 points) and financial data is not normally distributed.

**CONCLUSIONS**

- Seasonality is a statistically significant effect:
- Quarters 4-6 have mean returns above zero.
- Other quarterly means are not statistically different from zero.

- A robust calendar strategy to avoid negative periods cannot be designed.

### Like this:

Like Loading...

*Related*

Pingback: Quantocracy's Daily Wrap for 10/15/2015 | Quantocracy

When you do 8 tests, surely you have to adjust your T scores to reflect this. Adjust them down. Source: any decent statistics textbook.

There are 8 tests on 8 separate datasets (Q1-8) therefore correction for multiple comparisons is not required.

If the tests were all on the same dataset, a Bonferroni correction could be applied (or something more sophisticated).

https://en.wikipedia.org/wiki/Bonferroni_correction

I think you miss the point here. When you do multiple tests, even with independent data, you need to assess the overall result in a collective fashion.

Let’s say you test 1,000 different people for clairvoyance. The data are independent. 2 or 3 show ‘significant’ results. But it is nonsense.

You cannot say “some of the results were significant but others were not”. The whole suite of tests has to be assessed together. Not to mention any other data trawling you may have conducted before finding the “significant” result.

If you report the “most significant” results, then they are all part of the same test and need to be assessed collectively.

Proof: I take returns for 100 different random series of data. Some will be “significant’ but it is meaningless because it is known to be random data.

I suspect there is some signal left when you do this properly but not as much as it currently appears.

The best way is to bootstrap 1000 datasets of 32 randomly selected points from the data and run Student-t tests for each quarter against those datasets.

The significance would certainly be lower, reinforcing the conclusion that the effect cannot be profited from.