(under construction  remnants here need to be edited)
Matching death counts to Sundbarg’s data for 18611900
Deaths of unknown age were already included in Sundbarg’s published data (by 5year age groups) for 17511900. For the period 18611900, the official data (by singleyear age groups) listed deaths of unknown age separately. In addition, there were some slight inconsistencies between the two data sources for a limited number of years and age groups during this period of overlap.
In cases of disagreement, we chose to accept Sundbarg’s data as correct. At the same time, however, we wanted to retain the greater detail by age of the official data. Thus, we forced agreement between Sundbarg’s 5year data and the official singleyear data. More precisely, any difference between the two data sources was distributed proportionately into the official singleyear data, in such a manner that the total number within each 5year age group exactly matched Sundbarg’s numbers. This single step had the effect of both distributing the deaths of unknown age (according to Sundbarg’s judgments about where they belonged) and removing the minor inconsistencies between the two sources.
After 1900, published deaths contained a category of "age unknown" for the year 1945 only. These deaths were distributed proportionately across the entire age range (separately for males and females).
Similarly, census counts during 18601900 contained an "age unknown" category in years 1860, 1870, and 1880. In all cases, these numbers were distributed proportionately across the age range by sex.
Estimating January 1 Population
The method of extinct cohorts was used for cohorts that could reasonably be thought to be extinct by the end of 1995. By convention, cohorts that had obtained age 110 or older by 1995 (thus, those born in 1885 or earlier), were considered "extinct". For these extinct cohorts, population estimates for ages 80 and above were obtained by cumulating deaths backwards according to the standard technique (Vincent, 1951).
For all cohorts or periods, population estimates below age 80 were obtained by standard intercensal cohort survival methods (e.g., Vallin, 1973; Pressat, 1980). In addition, this method was used to derive population estimates above age 80 for nonextinct cohorts (born in 1886 or later).
Briefly, population estimates using the intercensal survival method are derived for each individual cohort using data from two successive censuses and death counts in the intervening period. For each cohort, deaths are subtracted from its population size in the first census to obtain an estimated population size at the time of the second census. The difference between the actual and estimated population size (at the time of the second census) is then computed. This difference is distributed evenly across the intercensal period in order to obtain final estimates of population size for the cohort on January 1 of each year.
In deriving estimates of population size using the intercensal cohort survival method, we used the results of the extinct cohort estimation for ages 80 and above rather than from the original census counts (or estimates). This substitution was made because of our belief that the former is a more reliable source of information on population size above age 80. Furthermore, this technique helps to avoid noticeable discontinuities around age 80 in our final estimates of population size.
Splitting grouped data into smaller age categories
For early time periods, published death counts and population figures (from censuses) are available only in a variety of aggregated formats. For example, prior to 1861, raw data for both deaths and population are available only by 5year age groups (with some exceptions at the youngest and oldest ages). During the period 18611900, death counts are broken down by single years of age but not by Lexis triangle. In this section, we explain our methods for estimating, where necessary, census counts by single years of age and death counts in Lexis triangles.
For years 17511860, data in 5year age groups (except for the youngest and oldest ages) were split into singleyear age groups using a method of linear interpolation applied to logarithms of the raw data. For death counts during this period, the given age intervals were usually 0, 12, 34, 59, 1014, ... , 8589, and 90+. For some years, however, raw data above age 90 were available in greater detail: 9094, 9599, and 100+. In all situations, we used the greatest level of detail available in the raw data.
Before interpolating, the raw data were scaled by dividing the death counts by the length of the age interval (the 90+ and 100+ category were divided, arbitrarily, by 15 and 5 years, respectively). Thus, the resulting estimates are scaled to singleyear age intervals. In applying the linear interpolation method, the midpoint of an age interval gives the xcoordinate, while the log of the scaled death count gives the ycoordinate.
Several additional notes are needed regarding this procedure. First, obviously, since raw data for age 0 were available, no estimate was obtained for this age. Second, however, we determined that it was not useful to include age 0 in the linear interpolation model to obtain estimates for age 1, since the number of deaths at age 0 is atypical. Instead, the estimate for age 1 is derived by a slight extrapolation, using the raw data for age groups 12 and 34. Third, in deriving estimates for ages 88104, the xcoordinate for the open 90+ category was (arbitrarily) set equal to 95; technically, then, estimates for ages 8894 were derived by interpolation, while those for 95104 were derived by extrapolation. (When the open interval is 100+ rather than 90+, the xcoordinate was set equal to 102.5.) For obvious reasons, we do not have great confidence in the accuracy of these estimates at very high ages. At the same time, estimates at such high ages in this era do not matter much for such important indices as life expectancy at birth (from either a period or cohort perspective).
By this method of loglinear interpolation, we obtained initial estimates of death counts by age, , where x refers to age at death (by single years). These estimates were multiplied by adjustment factors to take account of differences in cohort size, yielding a second set of estimates, . After the correction for relative cohort size, a further adjustment was made to ensure that the estimated death counts add up to the original total number of deaths in each age group, yielding . These values were the final estimates for all singleyear ages except ages 1 and 2, which required a final ad hoc adjustment. These three successive modifications of are explained below.
First, consider the cohort size adjustment, which is applied to the initial estimates (derived using the loglinear interpolation method). We begin by assuming that the formula for this adjustment should contain the quantity
,
where the equals the mean of birth counts for the two cohorts whose members are age x at some time during the year in question, and equals the mean of the five values of in the associated 5year age group. Thus, gives the average size of the two cohorts whose deaths may have occurred at age x in this year relative to the (weighted) average size of the six cohorts whose deaths may have occurred in the associated 5year age group.
Now, suppose that all birth cohorts represented in these averages are of equal size. We call this scenario the "constant births" model. In this situation, would equal one, and should already be a good estimates of . If these birth cohorts are not of equal size, however, then we need to adjust . Relative to the constant births model, the error in the estimated death count equals (thus, the absolute error relative to our initial estimate). Analogously, the relative difference between the observed and its value in the constant births model equals . We might reasonably suspect that these two quantities would be closely related, and our empirical investigation confirms that their relationship can be expressed by a simple regression model:
.
Furthermore, our empirical analyses have shown that in the equation is very close to zero and can thus be ignored. As shown in Table 1, the slope is close to one for years 18611900, which is the most relevant period for our purposes. Therefore, we can conclude that
,
and thus let , which completes the adjustment of the estimated death counts for differences in cohort size.
After making this cohort size adjustment, the values of need to be adjusted further to ensure that the estimates add up to the original totals for the age groups of the raw data. For example, if , then
.
Finally, a small inadequacy in the interpolation method affecting the results for ages 1 and 2 was noted. The estimated number of deaths for age 1 was consistently too low, while the opposite occurred for age 2. An ad hoc correction procedure was developed based on an analysis of more detailed data for years 18611890 and then applied to the results for all earlier years. We first computed the average error in the estimated deaths for age 1 relative to the total number of deaths for ages 12. This quantity was fairly constant during the period 18611890 (rising noticeably thereafter), and thus we computed its average over these years alone:
,
where and are the actual deaths for age 1 and 2, respectively, and is the current estimate of the death count for age 1. Thus, for females, and need to be adjusted in opposite directions by an amount equal to 0.0311 times the total number of deaths for ages 12, which is known even during the period 17511860:
and
.
where and are the final estimates of death counts for ages 1 and 2. For males, the adjustment is made using a correction factor of 0.0326.
Population data during 17511860, based on census counts at 5year intervals, were available only in 5year age groups (04, 59, 1014, ... , 8589, 90+). These numbers were split into singleyear age groups by nearly the same method used for the death counts. In fact, it was easier to apply the procedure in this case, since the available age groups were more regular over age and time.
First, initial estimates, , were obtained by loglinear interpolation. Second, these estimates were adjusted by the formula
.
Note that the cohort size adjustment factor, , is slightly different than in the previous case. Here,
where equals the birth count of the cohort that is aged x to x+1 at the time of the census (which occurs on December 31 of the census year), and equals the mean of the five values of in the associated 5year age group. Third, this set of estimates was adjusted further to ensure that totals within 5year age groups add up to their original values (see method described earlier for the deaths). For the census estimates, there was no need to make a final adjustment for the age group 12.
After splitting 5year death data for years 17511860 into singleyear age groups, we then split singleyear death data for years 17511900 into Lexis triangles using a linear regression model. For years 19011991, raw data by individual triangles were available at all ages, and these data were then used to derive the model that was subsequently applied for splitting the singleyear data for earlier years.
Many linear models were fit to the raw triangle data for 19011991, and seven of these models are summarized in Tables 2a (females) and 2b (males). In all cases, the dependent variable in these models was the proportion of deaths in each 1x1 Lexis square contained in the lower (righthand) triangle of that square. All models were fit by weighted least squares, where the weights were proportional to the number of deaths in the entire Lexis square.
All of the coefficients in the most comprehensive model, Model VII, are statistically significant at the 0.001 level. Thus, all the terms included in these models have some descriptive value. The age pattern of variation in the proportion of lowertriangle deaths (higher than average in the earliest and latest years of life) is easily visible from an inspection of the age coefficients for Models IIV. For the more complicated models, it is necessary to combine the effects of the interaction terms with the age coefficients in order to observe this same age pattern. Note: These linear models were fit using both singleyear age groups (0104) and 5year age groups (only the latter are shown in Tables 2a and 2b). For all computational purposes, we used the models with singleyear coefficients. The models with 5year coefficients are used for presentation purposes only, although in fact they produce only slightly inferior results if used for computation as well.
The next variable in these models is labeled "birth proportion," which is the number of births in the younger cohort (corresponding to the lower triangle of the Lexis square) as a proportion of the total births in the two cohorts that traverse the Lexis square in question. As such, this variable is analogous to the "death proportion," which is the dependent variable in these models. Therefore, a positive coefficient is a reasonable result.
The death proportions tend to increase over the observation period, as reflected in the positive coefficient for the variable "year" in Models IV through VII. This increase is much stronger for age 0, as seen in the interaction term. A curious result, however, is that the rapid increase in the death proportion for age 0 reverses itself around 1965. This change is captured in the next set of interaction terms and can be observed graphically in Figures 1a. A final pair of coefficients is included in order to reflect the impact of the Spanish flu epidemic during the winter of 191819. The increase in the death rate during that winter greatly elevated the proportion of deaths in lower triangles for 1918 (dominated by months at the end of the year) and in upper triangles for 1919 (dominated by months at the beginning of the year).
Having fit these seven models, we may be tempted to extrapolate the proportion of lower triangle deaths using the bestfitting model, Model VII, into the earlier time periods (17511900). An obvious problem with this approach is well illustrated in Figures 1a, 1b, and 1c, since the predictions outside of the observation period quickly become implausible. Especially for age 0, but for other ages as well, it does not seem believable that the proportion of lowertriangle deaths in earlier time periods could have been substantially below 0.5, as predicted by an extrapolation of Model VII.
A visual inspection of the raw data in the earliest portion of the observation period, 19011910, seems to show no trend over time, especially for age 0 (which is less affected by random variation because of the large number of deaths). This result suggests that the levels of these death proportions may not have increased substantially in earlier years. Therefore, we chose to derive a predicted proportion of lower triangle deaths for years 17511900 using Model VII but holding the value of the year variable constant at 1910. This assumption eliminates the implausible time trend for years before 1900 and yields predictions of the level of the death proportion that are similar to those observed in 19011910. These predictions still include the effect of variations in the birth proportion variable when available. Note: Birth counts were available back to 1749 only. For earlier cohorts, we set the birth proportion variable equal to 0.5. Thus, at advanced ages, the predicted proportion of lowertriangle deaths was constant prior to some time period.
The method employed for splitting data from 5year age groups into singleyear age groups was the third of three methods that we tested for this purpose. We refer to the final method as "loglinear," since it consists of a linear interpolation applied to logarithms of the raw data. The other two methods are called "linear" and "spline." The linear methods differs from the loglinear only in the fact that it does not employ logarithms: in other words, a linear interpolation is applied directly to the raw data. The spline method is also applied to the raw data (without logarithms). Instead of a simple linear interpolation, however, it involves a more complicated method of fitting moving sequences of cubic splines to raw death (or census) counts. After various comparisons, we determined that the loglinear method was, in general, at least as accurate (in some cases, more accurate) than the other two methods. It also has a clear advantage of simplicity compared with the spline method.
Unfortunately, it was not possible to use the same method for the two steps in the total splitting process. In other words, our best method for splitting data from 5year to singleyear age groups is fundamentally different from our method for splitting singleyear age groups into Lexis triangles. Of course, this distinction matters only for the death counts. Obviously, all of the variants of the interpolation method have the advantage of simplicity compared to the linear regression model, and we would have preferred to apply one of these methods (in particular, the loglinear interpolation method) also for splitting singleyear data into Lexis triangles. The interpolation methods failed in this case, however, and we were forced to employ a more complicated strategy involving a linear regression model. Briefly, the reason that the simpler interpolation methods do not work for this purpose is that the pattern of death counts by Lexis triangle is affected not only by age, but also by the seasonality of mortality (since uppertriangle deaths are weighted disproportionately toward the more lethal winter months).
Table 2a
Seven Linear Models of the Proportion of LowerTriangle Deaths *
Swedish Females, Ages 0104, Years 19011991
I 
II 
III 
IV 
V 
VI 
VII 

Intercept 
0.5124 
0.1580 
0.1584 
0.6478 
0.6678 
0.6999 
0.8036 
Age groups ** 

0 
0.2207 
0.2215 
0.2223 
0.2272 
5.5836 
6.2221 
6.1810 
1 
0.0551 
0.0560 
0.0555 
0.0644 
0.3261 
0.3551 
0.3538 
24 
0.0038 
0.0029 
0.0042 
0.0049 
0.2667 
0.2957 
0.2934 
59 
0.0098 
0.0091 
0.0109 
0.0018 
0.2601 
0.2892 
0.2862 
1014 
0.0207 
0.0200 
0.0221 
0.0131 
0.2490 
0.2780 
0.2746 
1519 
0.0156 
0.0150 
0.0180 
0.0090 
0.2533 
0.2824 
0.2780 
2024 
0.0130 
0.0126 
0.0159 
0.0071 
0.2554 
0.2844 
0.2796 
2529 
0.0131 
0.0129 
0.0165 
0.0079 
0.2547 
0.2838 
0.2785 
3034 
0.0133 
0.0134 
0.0160 
0.0095 
0.2534 
0.2825 
0.2783 
3539 
0.0176 
0.0175 
0.0185 
0.0146 
0.2486 
0.2776 
0.2751 
4044 
0.0182 
0.0183 
0.0181 
0.0168 
0.2468 
0.2759 
0.2744 
4549 
0.0163 
0.0166 
0.0158 
0.0166 
0.2475 
0.2765 
0.2756 
5054 
0.0186 
0.0187 
0.0177 
0.0196 
0.2447 
0.2738 
0.2730 
5559 
0.0147 
0.0147 
0.0135 
0.0164 
0.2481 
0.2772 
0.2764 
6064 
0.0192 
0.0194 
0.0181 
0.0219 
0.2429 
0.2719 
0.2713 
6569 
0.0206 
0.0209 
0.0194 
0.0240 
0.2409 
0.2700 
0.2694 
7074 
0.0226 
0.0231 
0.0215 
0.0268 
0.2383 
0.2673 
0.2668 
7579 
0.0231 
0.0237 
0.0220 
0.0282 
0.2371 
0.2662 
0.2657 
8084 
0.0223 
0.0230 
0.0213 
0.0286 
0.2370 
0.2660 
0.2654 
8589 
0.0151 
0.0159 
0.0140 
0.0229 
0.2432 
0.2722 
0.2716 
9094 
0.0045 
0.0053 
0.0034 
0.0139 
0.2526 
0.2816 
0.2808 
9599 
0.0056 
0.0050 
0.0070 
0.0054 
0.2616 
0.2906 
0.2897 
100104 
0.0207 
0.0202 
0.0222 
0.0079 
0.2755 
0.3045 
0.3033 
Birth proportion *** 
 
0.7088 
0.7032 
0.8238 
0.7575 
0.7628 
0.7716 
Year 
 
 
 
0.0004 
0.0003 
0.0003 
0.0003 
Year x Age=0 
 
 
 
 
0.0032 
0.0035 
0.0035 
Year³ 1966 x Age=0 
 
 
 
 
 
14.6083 
14.6769 
Year³ 1966 x Age=0 x Year 
 
 
 
 
 
0.0074 
0.0074 
Spanish Flu 

Year=1918 
 
 
0.1090 
 
 
 
0.1173 
Year=1919 
 
 
0.0258 
 
 
 
0.0168 
Rsquared **** (5year ages) 
0.6465 
0.6570 
0.7004 
0.6785 
0.7259 
0.7285 
0.7769 
Rsquared **** (1year ages) 
0.6444 
0.6549 
0.6983 
0.6765 
0.7238 
0.7265 
0.7749 
*  All models were fit by weighted least squares, with weights proportional to total deaths in each 1x1 Lexis square.
**  Coefficients for age groups are constrained to sum to zero; thus there is no omitted category.
***  The birth proportion is analogous to the death proportion (dependent variable) in each 1x1 Lexis square (see text for further explanation).
****  Rsquared in this case is the proportion of weighted variance (about the weighted mean) that is explained by the model.
Table 2b
Seven Linear Models of the Proportion of LowerTriangle Deaths *
Swedish Males, Ages 0104, Years 19011991
I 
II 
III 
IV 
V 
VI 
VII 

Intercept 
0.5185 
0.1329 
0.1347 
0.6654 
0.6242 
0.6458 
0.7406 
Age groups ** 

0 
0.2253 
0.2261 
0.2267 
0.2327 
5.6098 
5.980 
5.9432 
1 
0.0444 
0.0453 
0.0452 
0.0545 
0.3166 
0.3335 
0.3327 
24 
0.0068 
0.0059 
0.0067 
0.0023 
0.2647 
0.2816 
0.2799 
59 
0.0060 
0.0054 
0.0067 
0.0018 
0.2647 
0.2815 
0.2792 
1014 
0.0155 
0.0148 
0.0162 
0.0081 
0.2549 
0.2718 
0.2692 
1519 
0.0252 
0.0247 
0.0269 
0.0196 
0.2440 
0.2608 
0.2574 
2024 
0.0129 
0.0126 
0.0158 
0.0078 
0.2560 
0.2728 
0.2682 
2529 
0.0033 
0.0032 
0.0068 
0.0009 
0.2649 
0.2817 
0.2765 
3034 
0.0057 
0.0057 
0.0083 
0.0029 
0.2616 
0.2784 
0.2742 
3539 
0.0148 
0.0146 
0.0156 
0.0129 
0.2519 
0.2688 
0.2663 
4044 
0.0209 
0.0210 
0.0209 
0.0204 
0.2449 
0.2617 
0.2603 
4549 
0.0227 
0.0230 
0.0223 
0.0235 
0.2421 
0.2589 
0.2581 
5054 
0.0211 
0.0209 
0.0200 
0.0224 
0.2436 
0.2604 
0.2597 
5559 
0.0200 
0.0196 
0.0185 
0.0222 
0.2442 
0.2610 
0.2603 
6064 
0.0202 
0.0201 
0.0188 
0.0236 
0.2431 
0.2600 
0.2593 
6569 
0.0208 
0.0209 
0.0196 
0.0251 
0.2419 
0.2587 
0.2580 
7074 
0.0212 
0.0216 
0.0202 
0.0261 
0.2410 
0.2578 
0.2572 
7579 
0.0234 
0.0239 
0.0225 
0.0286 
0.2386 
0.2554 
0.2548 
8084 
0.0225 
0.0234 
0.0221 
0.0283 
0.2389 
0.2558 
0.2551 
8589 
0.0165 
0.0174 
0.0160 
0.0229 
0.2446 
0.2614 
0.2608 
9094 
0.0080 
0.0090 
0.0075 
0.0156 
0.2523 
0.2691 
0.2683 
9599 
0.0081 
0.0073 
0.0088 
0.0010 
0.2675 
0.2843 
0.2834 
100104 
0.0298 
0.0288 
0.0308 
0.0187 
0.2879 
0.3047 
0.3041 
Birth proportion *** 
 
0.7712 
0.7640 
0.8875 
0.8033 
0.8112 
0.8177 
Year 
 
 
 
0.0004 
0.0002 
0.0002 
0.0003 
Year x Age=0 
 
 
 
 
0.0032 
0.0034 
0.0034 
Year³ 1966 x Age=0 
 
 
 
 
 
13.9563 
14.019 
Year³ 1966 x Age=0 x Year 
 
 
 
 
 
0.0071 
0.0071 
Spanish Flu 

Year=1918 
 
 
0.1022 
 
 
 
0.1110 
Year=1919 
 
 
0.0359 
 
 
 
0.0265 
Rsquared **** (5year ages) 
0.7014 
0.7122 
0.7449 
0.7302 
0.7808 
0.7830 
0.8194 
Rsquared **** (1year ages) 
0.7042 
0.7150 
0.7476 
0.7330 
0.7835 
0.7858 
0.8222 
*  All models were fit by weighted least squares, with weights proportional to total deaths in each 1x1 Lexis square.
**  Coefficients for age groups are constrained to sum to zero; thus there is no omitted category.
***  The birth proportion is analogous to the death proportion (dependent variable) in each 1x1 Lexis square (see text for further explanation).
****  Rsquared in this case is the proportion of weighted variance (about the weighted mean) that is explained by the model.
Constructing Cohort Lifetables