Notes (Sweden)

(under construction -- remnants here need to be edited)

Adjustments to Census and Death Counts

Matching death counts to Sundbarg’s data for 1861-1900

Deaths of unknown age were already included in Sundbarg’s published data (by 5-year age groups) for 1751-1900. For the period 1861-1900, the official data (by single-year age groups) listed deaths of unknown age separately. In addition, there were some slight inconsistencies between the two data sources for a limited number of years and age groups during this period of overlap.

In cases of disagreement, we chose to accept Sundbarg’s data as correct. At the same time, however, we wanted to retain the greater detail by age of the official data. Thus, we forced agreement between Sundbarg’s 5-year data and the official single-year data. More precisely, any difference between the two data sources was distributed proportionately into the official single-year data, in such a manner that the total number within each 5-year age group exactly matched Sundbarg’s numbers. This single step had the effect of both distributing the deaths of unknown age (according to Sundbarg’s judgments about where they belonged) and removing the minor inconsistencies between the two sources.

Distributing "age unknown" for deaths and censuses

After 1900, published deaths contained a category of "age unknown" for the year 1945 only. These deaths were distributed proportionately across the entire age range (separately for males and females).

Similarly, census counts during 1860-1900 contained an "age unknown" category in years 1860, 1870, and 1880. In all cases, these numbers were distributed proportionately across the age range by sex.

 

Estimating January 1 Population

The method of extinct cohorts was used for cohorts that could reasonably be thought to be extinct by the end of 1995. By convention, cohorts that had obtained age 110 or older by 1995 (thus, those born in 1885 or earlier), were considered "extinct". For these extinct cohorts, population estimates for ages 80 and above were obtained by cumulating deaths backwards according to the standard technique (Vincent, 1951).

For all cohorts or periods, population estimates below age 80 were obtained by standard intercensal cohort survival methods (e.g., Vallin, 1973; Pressat, 1980). In addition, this method was used to derive population estimates above age 80 for non-extinct cohorts (born in 1886 or later).

Briefly, population estimates using the intercensal survival method are derived for each individual cohort using data from two successive censuses and death counts in the intervening period. For each cohort, deaths are subtracted from its population size in the first census to obtain an estimated population size at the time of the second census. The difference between the actual and estimated population size (at the time of the second census) is then computed. This difference is distributed evenly across the intercensal period in order to obtain final estimates of population size for the cohort on January 1 of each year.

In deriving estimates of population size using the intercensal cohort survival method, we used the results of the extinct cohort estimation for ages 80 and above rather than from the original census counts (or estimates). This substitution was made because of our belief that the former is a more reliable source of information on population size above age 80. Furthermore, this technique helps to avoid noticeable discontinuities around age 80 in our final estimates of population size.

illustration

 

 Splitting grouped data into smaller age categories

For early time periods, published death counts and population figures (from censuses) are available only in a variety of aggregated formats. For example, prior to 1861, raw data for both deaths and population are available only by 5-year age groups (with some exceptions at the youngest and oldest ages). During the period 1861-1900, death counts are broken down by single years of age but not by Lexis triangle. In this section, we explain our methods for estimating, where necessary, census counts by single years of age and death counts in Lexis triangles.

 

  1. From 5-year into single-year age groups (deaths and census)

    1. Death counts
    2. For years 1751-1860, data in 5-year age groups (except for the youngest and oldest ages) were split into single-year age groups using a method of linear interpolation applied to logarithms of the raw data. For death counts during this period, the given age intervals were usually 0, 1-2, 3-4, 5-9, 10-14, ... , 85-89, and 90+. For some years, however, raw data above age 90 were available in greater detail: 90-94, 95-99, and 100+. In all situations, we used the greatest level of detail available in the raw data.

      Before interpolating, the raw data were scaled by dividing the death counts by the length of the age interval (the 90+ and 100+ category were divided, arbitrarily, by 15 and 5 years, respectively). Thus, the resulting estimates are scaled to single-year age intervals. In applying the linear interpolation method, the midpoint of an age interval gives the x-coordinate, while the log of the scaled death count gives the y-coordinate.

      Several additional notes are needed regarding this procedure. First, obviously, since raw data for age 0 were available, no estimate was obtained for this age. Second, however, we determined that it was not useful to include age 0 in the linear interpolation model to obtain estimates for age 1, since the number of deaths at age 0 is atypical. Instead, the estimate for age 1 is derived by a slight extrapolation, using the raw data for age groups 1-2 and 3-4. Third, in deriving estimates for ages 88-104, the x-coordinate for the open 90+ category was (arbitrarily) set equal to 95; technically, then, estimates for ages 88-94 were derived by interpolation, while those for 95-104 were derived by extrapolation. (When the open interval is 100+ rather than 90+, the x-coordinate was set equal to 102.5.) For obvious reasons, we do not have great confidence in the accuracy of these estimates at very high ages. At the same time, estimates at such high ages in this era do not matter much for such important indices as life expectancy at birth (from either a period or cohort perspective).

      By this method of log-linear interpolation, we obtained initial estimates of death counts by age, , where x refers to age at death (by single years). These estimates were multiplied by adjustment factors to take account of differences in cohort size, yielding a second set of estimates, . After the correction for relative cohort size, a further adjustment was made to ensure that the estimated death counts add up to the original total number of deaths in each age group, yielding . These values were the final estimates for all single-year ages except ages 1 and 2, which required a final ad hoc adjustment. These three successive modifications of are explained below.

      First, consider the cohort size adjustment, which is applied to the initial estimates (derived using the log-linear interpolation method). We begin by assuming that the formula for this adjustment should contain the quantity

      ,

      where the equals the mean of birth counts for the two cohorts whose members are age x at some time during the year in question, and equals the mean of the five values of in the associated 5-year age group. Thus, gives the average size of the two cohorts whose deaths may have occurred at age x in this year relative to the (weighted) average size of the six cohorts whose deaths may have occurred in the associated 5-year age group.

      Now, suppose that all birth cohorts represented in these averages are of equal size. We call this scenario the "constant births" model. In this situation, would equal one, and should already be a good estimates of . If these birth cohorts are not of equal size, however, then we need to adjust . Relative to the constant births model, the error in the estimated death count equals (thus, the absolute error relative to our initial estimate). Analogously, the relative difference between the observed and its value in the constant births model equals . We might reasonably suspect that these two quantities would be closely related, and our empirical investigation confirms that their relationship can be expressed by a simple regression model:

      .

      Furthermore, our empirical analyses have shown that in the equation is very close to zero and can thus be ignored. As shown in Table 1, the slope is close to one for years 1861-1900, which is the most relevant period for our purposes. Therefore, we can conclude that

      ,

      and thus let , which completes the adjustment of the estimated death counts for differences in cohort size.

      After making this cohort size adjustment, the values of need to be adjusted further to ensure that the estimates add up to the original totals for the age groups of the raw data. For example, if , then

      .

      Finally, a small inadequacy in the interpolation method affecting the results for ages 1 and 2 was noted. The estimated number of deaths for age 1 was consistently too low, while the opposite occurred for age 2. An ad hoc correction procedure was developed based on an analysis of more detailed data for years 1861-1890 and then applied to the results for all earlier years. We first computed the average error in the estimated deaths for age 1 relative to the total number of deaths for ages 1-2. This quantity was fairly constant during the period 1861-1890 (rising noticeably thereafter), and thus we computed its average over these years alone:

      ,

      where and are the actual deaths for age 1 and 2, respectively, and is the current estimate of the death count for age 1. Thus, for females, and need to be adjusted in opposite directions by an amount equal to 0.0311 times the total number of deaths for ages 1-2, which is known even during the period 1751-1860:

      and

      .

      where and are the final estimates of death counts for ages 1 and 2. For males, the adjustment is made using a correction factor of 0.0326.

    3. Census counts
    4. Population data during 1751-1860, based on census counts at 5-year intervals, were available only in 5-year age groups (0-4, 5-9, 10-14, ... , 85-89, 90+). These numbers were split into single-year age groups by nearly the same method used for the death counts. In fact, it was easier to apply the procedure in this case, since the available age groups were more regular over age and time.

      First, initial estimates, , were obtained by log-linear interpolation. Second, these estimates were adjusted by the formula

      .

      Note that the cohort size adjustment factor, , is slightly different than in the previous case. Here,

      where equals the birth count of the cohort that is aged x to x+1 at the time of the census (which occurs on December 31 of the census year), and equals the mean of the five values of in the associated 5-year age group. Third, this set of estimates was adjusted further to ensure that totals within 5-year age groups add up to their original values (see method described earlier for the deaths). For the census estimates, there was no need to make a final adjustment for the age group 1-2.

  2. From single-year age groups into Lexis triangles (deaths only)
  3. After splitting 5-year death data for years 1751-1860 into single-year age groups, we then split single-year death data for years 1751-1900 into Lexis triangles using a linear regression model. For years 1901-1991, raw data by individual triangles were available at all ages, and these data were then used to derive the model that was subsequently applied for splitting the single-year data for earlier years.

    Many linear models were fit to the raw triangle data for 1901-1991, and seven of these models are summarized in Tables 2a (females) and 2b (males). In all cases, the dependent variable in these models was the proportion of deaths in each 1x1 Lexis square contained in the lower (right-hand) triangle of that square. All models were fit by weighted least squares, where the weights were proportional to the number of deaths in the entire Lexis square.

    All of the coefficients in the most comprehensive model, Model VII, are statistically significant at the 0.001 level. Thus, all the terms included in these models have some descriptive value. The age pattern of variation in the proportion of lower-triangle deaths (higher than average in the earliest and latest years of life) is easily visible from an inspection of the age coefficients for Models I-IV. For the more complicated models, it is necessary to combine the effects of the interaction terms with the age coefficients in order to observe this same age pattern. Note: These linear models were fit using both single-year age groups (0-104) and 5-year age groups (only the latter are shown in Tables 2a and 2b). For all computational purposes, we used the models with single-year coefficients. The models with 5-year coefficients are used for presentation purposes only, although in fact they produce only slightly inferior results if used for computation as well.

    The next variable in these models is labeled "birth proportion," which is the number of births in the younger cohort (corresponding to the lower triangle of the Lexis square) as a proportion of the total births in the two cohorts that traverse the Lexis square in question. As such, this variable is analogous to the "death proportion," which is the dependent variable in these models. Therefore, a positive coefficient is a reasonable result.

    The death proportions tend to increase over the observation period, as reflected in the positive coefficient for the variable "year" in Models IV through VII. This increase is much stronger for age 0, as seen in the interaction term. A curious result, however, is that the rapid increase in the death proportion for age 0 reverses itself around 1965. This change is captured in the next set of interaction terms and can be observed graphically in Figures 1a. A final pair of coefficients is included in order to reflect the impact of the Spanish flu epidemic during the winter of 1918-19. The increase in the death rate during that winter greatly elevated the proportion of deaths in lower triangles for 1918 (dominated by months at the end of the year) and in upper triangles for 1919 (dominated by months at the beginning of the year).

    Having fit these seven models, we may be tempted to extrapolate the proportion of lower triangle deaths using the best-fitting model, Model VII, into the earlier time periods (1751-1900). An obvious problem with this approach is well illustrated in Figures 1a, 1b, and 1c, since the predictions outside of the observation period quickly become implausible. Especially for age 0, but for other ages as well, it does not seem believable that the proportion of lower-triangle deaths in earlier time periods could have been substantially below 0.5, as predicted by an extrapolation of Model VII.

    A visual inspection of the raw data in the earliest portion of the observation period, 1901-1910, seems to show no trend over time, especially for age 0 (which is less affected by random variation because of the large number of deaths). This result suggests that the levels of these death proportions may not have increased substantially in earlier years. Therefore, we chose to derive a predicted proportion of lower triangle deaths for years 1751-1900 using Model VII but holding the value of the year variable constant at 1910. This assumption eliminates the implausible time trend for years before 1900 and yields predictions of the level of the death proportion that are similar to those observed in 1901-1910. These predictions still include the effect of variations in the birth proportion variable when available. Note: Birth counts were available back to 1749 only. For earlier cohorts, we set the birth proportion variable equal to 0.5. Thus, at advanced ages, the predicted proportion of lower-triangle deaths was constant prior to some time period.

  4. Discussion
  5. The method employed for splitting data from 5-year age groups into single-year age groups was the third of three methods that we tested for this purpose. We refer to the final method as "log-linear," since it consists of a linear interpolation applied to logarithms of the raw data. The other two methods are called "linear" and "spline." The linear methods differs from the log-linear only in the fact that it does not employ logarithms: in other words, a linear interpolation is applied directly to the raw data. The spline method is also applied to the raw data (without logarithms). Instead of a simple linear interpolation, however, it involves a more complicated method of fitting moving sequences of cubic splines to raw death (or census) counts. After various comparisons, we determined that the log-linear method was, in general, at least as accurate (in some cases, more accurate) than the other two methods. It also has a clear advantage of simplicity compared with the spline method.

    Unfortunately, it was not possible to use the same method for the two steps in the total splitting process. In other words, our best method for splitting data from 5-year to single-year age groups is fundamentally different from our method for splitting single-year age groups into Lexis triangles. Of course, this distinction matters only for the death counts. Obviously, all of the variants of the interpolation method have the advantage of simplicity compared to the linear regression model, and we would have preferred to apply one of these methods (in particular, the log-linear interpolation method) also for splitting single-year data into Lexis triangles. The interpolation methods failed in this case, however, and we were forced to employ a more complicated strategy involving a linear regression model. Briefly, the reason that the simpler interpolation methods do not work for this purpose is that the pattern of death counts by Lexis triangle is affected not only by age, but also by the seasonality of mortality (since upper-triangle deaths are weighted disproportionately toward the more lethal winter months).

 

Table 2a

Seven Linear Models of the Proportion of Lower-Triangle Deaths *

Swedish Females, Ages 0-104, Years 1901-1991

 

I

II

III

IV

V

VI

VII

Intercept

0.5124

0.1580

0.1584

-0.6478

-0.6678

-0.6999

-0.8036

Age groups **

             

0

0.2207

0.2215

0.2223

0.2272

-5.5836

-6.2221

-6.1810

1

0.0551

0.0560

0.0555

0.0644

0.3261

0.3551

0.3538

2-4

-0.0038

-0.0029

-0.0042

0.0049

0.2667

0.2957

0.2934

5-9

-0.0098

-0.0091

-0.0109

-0.0018

0.2601

0.2892

0.2862

10-14

-0.0207

-0.0200

-0.0221

-0.0131

0.2490

0.2780

0.2746

15-19

-0.0156

-0.0150

-0.0180

-0.0090

0.2533

0.2824

0.2780

20-24

-0.0130

-0.0126

-0.0159

-0.0071

0.2554

0.2844

0.2796

25-29

-0.0131

-0.0129

-0.0165

-0.0079

0.2547

0.2838

0.2785

30-34

-0.0133

-0.0134

-0.0160

-0.0095

0.2534

0.2825

0.2783

35-39

-0.0176

-0.0175

-0.0185

-0.0146

0.2486

0.2776

0.2751

40-44

-0.0182

-0.0183

-0.0181

-0.0168

0.2468

0.2759

0.2744

45-49

-0.0163

-0.0166

-0.0158

-0.0166

0.2475

0.2765

0.2756

50-54

-0.0186

-0.0187

-0.0177

-0.0196

0.2447

0.2738

0.2730

55-59

-0.0147

-0.0147

-0.0135

-0.0164

0.2481

0.2772

0.2764

60-64

-0.0192

-0.0194

-0.0181

-0.0219

0.2429

0.2719

0.2713

65-69

-0.0206

-0.0209

-0.0194

-0.0240

0.2409

0.2700

0.2694

70-74

-0.0226

-0.0231

-0.0215

-0.0268

0.2383

0.2673

0.2668

75-79

-0.0231

-0.0237

-0.0220

-0.0282

0.2371

0.2662

0.2657

80-84

-0.0223

-0.0230

-0.0213

-0.0286

0.2370

0.2660

0.2654

85-89

-0.0151

-0.0159

-0.0140

-0.0229

0.2432

0.2722

0.2716

90-94

-0.0045

-0.0053

-0.0034

-0.0139

0.2526

0.2816

0.2808

95-99

0.0056

0.0050

0.0070

-0.0054

0.2616

0.2906

0.2897

100-104

0.0207

0.0202

0.0222

0.0079

0.2755

0.3045

0.3033

Birth proportion ***

--

0.7088

0.7032

0.8238

0.7575

0.7628

0.7716

Year

--

--

--

0.0004

0.0003

0.0003

0.0003

Year x Age=0

--

--

--

--

0.0032

0.0035

0.0035

Year³ 1966 x Age=0

--

--

--

--

--

14.6083

14.6769

Year³ 1966 x Age=0 x Year

--

--

--

--

--

-0.0074

-0.0074

Spanish Flu

             

Year=1918

--

--

0.1090

--

--

--

0.1173

Year=1919

--

--

-0.0258

--

--

--

-0.0168

R-squared **** (5-year ages)

0.6465

0.6570

0.7004

0.6785

0.7259

0.7285

0.7769

R-squared **** (1-year ages)

0.6444

0.6549

0.6983

0.6765

0.7238

0.7265

0.7749

* - All models were fit by weighted least squares, with weights proportional to total deaths in each 1x1 Lexis square.

** - Coefficients for age groups are constrained to sum to zero; thus there is no omitted category.

*** - The birth proportion is analogous to the death proportion (dependent variable) in each 1x1 Lexis square (see text for further explanation).

**** - R-squared in this case is the proportion of weighted variance (about the weighted mean) that is explained by the model.

 

Table 2b

Seven Linear Models of the Proportion of Lower-Triangle Deaths *

Swedish Males, Ages 0-104, Years 1901-1991

 

I

II

III

IV

V

VI

VII

Intercept

0.5185

0.1329

0.1347

-0.6654

-0.6242

-0.6458

-0.7406

Age groups **

             

0

0.2253

0.2261

0.2267

0.2327

-5.6098

-5.980

-5.9432

1

0.0444

0.0453

0.0452

0.0545

0.3166

0.3335

0.3327

2-4

-0.0068

-0.0059

-0.0067

0.0023

0.2647

0.2816

0.2799

5-9

-0.0060

-0.0054

-0.0067

0.0018

0.2647

0.2815

0.2792

10-14

-0.0155

-0.0148

-0.0162

-0.0081

0.2549

0.2718

0.2692

15-19

-0.0252

-0.0247

-0.0269

-0.0196

0.2440

0.2608

0.2574

20-24

-0.0129

-0.0126

-0.0158

-0.0078

0.2560

0.2728

0.2682

25-29

-0.0033

-0.0032

-0.0068

0.0009

0.2649

0.2817

0.2765

30-34

-0.0057

-0.0057

-0.0083

-0.0029

0.2616

0.2784

0.2742

35-39

-0.0148

-0.0146

-0.0156

-0.0129

0.2519

0.2688

0.2663

40-44

-0.0209

-0.0210

-0.0209

-0.0204

0.2449

0.2617

0.2603

45-49

-0.0227

-0.0230

-0.0223

-0.0235

0.2421

0.2589

0.2581

50-54

-0.0211

-0.0209

-0.0200

-0.0224

0.2436

0.2604

0.2597

55-59

-0.0200

-0.0196

-0.0185

-0.0222

0.2442

0.2610

0.2603

60-64

-0.0202

-0.0201

-0.0188

-0.0236

0.2431

0.2600

0.2593

65-69

-0.0208

-0.0209

-0.0196

-0.0251

0.2419

0.2587

0.2580

70-74

-0.0212

-0.0216

-0.0202

-0.0261

0.2410

0.2578

0.2572

75-79

-0.0234

-0.0239

-0.0225

-0.0286

0.2386

0.2554

0.2548

80-84

-0.0225

-0.0234

-0.0221

-0.0283

0.2389

0.2558

0.2551

85-89

-0.0165

-0.0174

-0.0160

-0.0229

0.2446

0.2614

0.2608

90-94

-0.0080

-0.0090

-0.0075

-0.0156

0.2523

0.2691

0.2683

95-99

0.0081

0.0073

0.0088

-0.0010

0.2675

0.2843

0.2834

100-104

0.0298

0.0288

0.0308

0.0187

0.2879

0.3047

0.3041

Birth proportion ***

--

0.7712

0.7640

0.8875

0.8033

0.8112

0.8177

Year

--

--

--

0.0004

0.0002

0.0002

0.0003

Year x Age=0

--

--

--

--

0.0032

0.0034

0.0034

Year³ 1966 x Age=0

--

--

--

--

--

13.9563

14.019

Year³ 1966 x Age=0 x Year

--

--

--

--

--

-0.0071

-0.0071

Spanish Flu

             

Year=1918

--

--

0.1022

--

--

--

0.1110

Year=1919

--

--

-0.0359

--

--

--

-0.0265

R-squared **** (5-year ages)

0.7014

0.7122

0.7449

0.7302

0.7808

0.7830

0.8194

R-squared **** (1-year ages)

0.7042

0.7150

0.7476

0.7330

0.7835

0.7858

0.8222

* - All models were fit by weighted least squares, with weights proportional to total deaths in each 1x1 Lexis square.

** - Coefficients for age groups are constrained to sum to zero; thus there is no omitted category.

*** - The birth proportion is analogous to the death proportion (dependent variable) in each 1x1 Lexis square (see text for further explanation).

**** - R-squared in this case is the proportion of weighted variance (about the weighted mean) that is explained by the model.

 

Constructing Cohort Lifetables

  1. Deaths in Lexis triangles were converted from period format to cohort format.
  2. Average population figures for years 1751-1860 were split from 5-year age groups (0, 1-4, …, 90+) to single years of age (0, …, 104) using log-linear interpolation.
  3. The results of step 1.) were averaged over consecutive years to obtain January 1 population estimates by single years of age.
  4. Final January 1 population estimates for 1751-1860 combined the results of step 2.) for ages 0-89 with January 1 population estimates for ages 90-119 obtained from the extinct cohorts method .
  5. January 1 population estimates for 1751-1860 were combined with January 1 population estimates for 1861 -1995 and converted to cohort exposures using the formula P + 1/6 (D1 - D2), where D1 and D2 are the lower and upper triangle deaths of the birth cohort and P is the population at the beginning of the next corresponding calendar year.
  6. Deaths were divided by exposures to obtain cohort death rates. For incomplete cohorts, the death rates beyond age 1995-y, where y is the year of birth, were listed as 'NA'. In the calculation of life tables, however, the death rates at these (still unobserved) ages were equated with the period rates for 1991-1995.

 

Notes on early data (1751-1860)

  1. There are two sets of population figures given for the time period 1751-1860. The census figures for these years have been corrected by Gustav Sundbarg for errors in age distribution. Exposure estimates for 1751-1860 are the official (uncorrected) mid-year population estimates. The uncorrected estimates are the correct figures to use in the calculation of death rates since it is likely that the same kinds of errors were made in recording both population counts and death counts.
  2. Death counts during 1751-1860 were consistently over-reported for age groups 30-34, 40-44, … , and 80-84, and under-reported for age groups 25-29, 35-39, …, and 85-89. This anomaly diminished over time during this period and effectively disappeared by 1860. No adjustment has been made for age misreporting at this time.