*This is the first part of a two-part series.*

**Abstract**Examining the intersection of research on the effects of (re)insurance risk diversification and availability of big insurance data components for competitive underwriting and premium pricing is the purpose for this paper. We study the combination of physical diversification by geography and insured natural peril with the complexity of aggregate structured insurance products, and furthermore how big historical and modeled data components affect product underwriting decisions. Under such market conditions, the availability of big data components facilitates accurate measurement of inter-dependencies among risks, and the definition of optimal and competitive insurance premium at the level of the firm and the policy holders. In the second part of this article, we extend the discourse to a notional micro-economy and examine the impact of diversification and insurance big data components on the potential for developing strategies for sustainable and economical insurance policy underwriting. We review concepts of parallel and distributed algorithmic computing for big data clustering, mapping and resource reducing algorithms.

**Introduction**This working paper will examine how big data and fast compute platforms solve some complex premium pricing and portfolio structuring and accumulation problems in the context of flood insurance markets. Our second objective is to measure the effects of geo-spatial insurance risk diversification through modeling of interdependencies and show that such measures have impact on single risk premium definition and its market cost. The single product case studies examine the pricing of insurance umbrella coverage. They are selected to address scenarios relevant to current (re)insurance market conditions under intense premium competition. Then we extend the discourse to a micro-economy of multiple policy holders and aim to generalize some finding on economies of scale and diversification. The outcomes of all case studies and theoretical analysis depend on the availability of big insurance data components for modeling and pricing workflows. The quality, usability and computational cost of such data components determine their direct impact on the underwriting and pricing process and on definition of the single risk cost of insurance.

**1.0 Pricing Aggregate Umbrella Policies**Insurers are competing actively for insureds' premiums and looking for economies of scale to offset and balance premium competition and thus develop more sustainable long-term underwriting strategies. While writing competitive premium policies and setting up flexible contract structures, insurers are mindful of risk concentration and the lower bounds of fair technical pricing. Structuring of aggregate umbrella policies lends itself to underwriting practices of larger scales in market share and diversification. Only large insurers have the economies of scale to offer such products to their clients. Premium pricing of umbrella and global policies relies on both market conditions and mathematical modeling arguments. On the market and operational side, the insurer relies on lower cost of umbrella products due to efficiencies of scale in brokerage, claims management, administration and even in the computational scale-up of the modeling and pricing internal functions of its actuarial departments. In our study, we will first focus on the statistical modeling argument, and then we will define big data components, which allow for solving such policy structuring and pricing problems.

*We first set up the case study on a smaller scale in context of two risks -- with insured limits for flood of $90 million and $110 million. These risks are priced for combined river-rain and storm surge flood coverage, first with both single limits separately and independently and then in an aggregate umbrella insurance product with a combined limit of $200 million: (1.0)*

**See also: 3 Reasons Insurance Is Changed Forever***Umbrella(200M) = Limit 1 (90M) + Limit 2 (110M)*The two risks are owned by a single insured and are located in a historical flood zone, less than 1 kilometer from each other. For premium pricing, we assume a traditional approach dependent on modeled expected values of insured loss and standard deviation of loss. To set the statistical mechanics of the case study for both risks, we have a modeled flood insurance loss data samples

*Q*respectively for both risks, from a stochastic simulation -

_{t}and S_{t}*T*. Modeled insured losses have an expected value

*E*[.] and a standard deviation σ[.], which define a standard policy premium of

*π(.)*When both policies’ premiums are priced independently, by the standard deviation pricing principle we have: (1.1)

*π(S*

_{t}) = E[S_{t}] + σ[S_{t}]*π(Q*With non-negative loadings, it follows that: (2.0)

_{t}) = E[Q_{t}] + σ[Q_{t}]*π(S*

_{t}) ≥ E[S_{t}]*π(Q*Because both risks are owned by the same insured, we aggregate the two standard premium equations, using traditional statistical accumulation principles for expected values and standard deviations of loss. (3.0)

_{t}) ≥ E[Q_{t}]*π(Q*

_{t})+π(S_{t})= E[S_{t}]+σ[S_{t}]+E[Q_{t}]+σ[Q_{t}]*π(Q*The theoretical joint insured loss distribution function

_{t})+π(S_{t})= E[S_{t}+Q_{t}]+σ[S_{t}]+σ[Q_{t}]*f*) of the two risks will have an expected value of insured loss: (4.0)

_{S,Q}(S_{t},Q_{t}*E[S*And a joint theoretical standard deviation of insured loss: (4.1)

_{t}+ Q_{t}] = E[S_{t}] + E[Q_{t}]*σ[S*We use further these aggregation principles to express the sum of two single risks premiums

_{t}+ Q_{t}] = √(σ^{2}[S_{t}] + σ^{2}[Q_{t}] + 2ρσ[S_{t}] * σ[Q_{t}])*π(Q*, as well as to derive a combined premium

_{t}), π(S_{t})*π(Q*for an umbrella coverage product insuring both risks with equivalency in limits as in (1.0). An expectation for full equivalency in premium definition produces the following equality: (4.2)

_{t}+ S_{t})*π(Q*The expression introduces a correlation factor

_{t}+ S_{t}) = E[S_{t}+ Q_{t}] + √(σ^{2}[S_{t}] + σ^{2}[Q_{t}] + 2ρσ[S_{t}] * σ[Q_{t}]) = π(Q_{t}) + π(S_{t})*ρ*between modeled insured losses of the two policies. In our case study, this correlation factor specifically expresses dependencies between historical and modeled losses for the same insured peril due to geo-spatial distances. Such correlation factors are derived by algorithms that measure dependencies of historical and modeled losses on their sensitivities to geo-spatial distances among risks. In this article, we will not delve into the definition of such geo-spatial correlation algorithms. Three general cases of dependence relationships among flood risks due to their geographical situation and distances are examined in our article: full independence, full dependence and partial dependence.

**2.0 Sub-Additivity, Dependence and Diversification**

*Scenario 2.1: Two Boundary Cases of Fully Dependent and Fully Independent Risks*In the first boundary case, where we study full dependence between risks, expressed with a unit correlation factor, we have from first statistical principles that the theoretical sum of the standard deviations of loss of the fully dependent risks is equivalent to the standard deviation of the joint loss distribution of the two risks combined, as defined in equation (4.1). (4.3)

*σ[S*For expected values of loss, we already have a known theoretical relationship between single risks’ expected insurance loss and umbrella product expected loss in equation (4.0). The logic of summations and equalities for the two components in standard premium definition in (4.0) and (4.3) leads to deriving a relationship of proven full additivity in premiums between the single policies and the aggregate umbrella product, as described in equation (4.2), and shortened as: (4.4)

_{t}+ Q_{t}] = √(σ^{2}[S_{t}] + σ^{2}[Q_{t}] + 2σ[S_{t}] * σ[Q_{t}]) = σ[S_{t}] + σ[Q_{t}]*π(Q*Some underwriting conclusions are evident from this analysis. When structuring a combined umbrella product for fully dependent risks, in very close to identical geographical space, same insured peril and line-of-business, the price of the aggregated umbrella product should approach the sum of single risk premiums priced independently. The absence of diversification in geography and insured catastrophe peril prevents any significant opportunities for cost savings or competitiveness in premium pricing. The summation of riskiness form single policies to aggregate forms of products is linear and co-monotonic. Economies of market share scale do not play a role in highly clustered and concentrated pools of risks, where diversification is not achievable, and inter-risk dependencies are close to perfect. In such scenarios, the impact of big data components to underwriting and pricing practices is not as prominent, because formulation of standard premiums for single risks and aggregated products could be achieved by theoretical formulations.

_{t}+ S_{t}) = π(Q_{t}) + π(S_{t})*In our second boundary case of*

**See also: #1 Affliction Costing Businesses Billions***full and perfect independence*, when two or more risks with two separate insurance limits are priced independently and separately, the summation of their premiums is still required for portfolio accumulations by line of business and geographic and administrative region. This premium accumulation task or "roll-up" of fully independent risks is accomplished by practitioners accordingly with the linear principles of equation (3.0). However, if we are to structure an aggregate umbrella cover for these same single risks with an aggregated premium of

*π(Q*, the effect of statistical independence expressed with a zero correlation factor will reduce equation (3.0) to equation (5.0). (5.0)

_{t}+ S_{t})*π(Q*Full independence among risks more strongly than any other cases supports the premium sub-additivity principle, which is stated in (6.0). (6.0)

_{t}+ S_{t}) = E[S_{t}+ Q_{t}] + √(σ^{2}[S_{t}] + σ^{2}[Q_{t}])*π(Q*An expanded expression of the subadditivity principle is easily derived from the linear summation of premiums in (3.0) and the expression of the combined single insurance product premium in (5.0). Some policy and premium underwriting guidelines can be derived from this regime of full statistical independence. Under conditions of full independence, when two risks are priced independently and separately the sum of their premiums will always be larger than the premium of an aggregate umbrella product covering these same two risks. The physical and geographic characteristics of full statistical independence for modeled insurance loss are large geo-spatial distances and independent insured catastrophe perils and business lines. In practice, this is generally defined as insurance risk portfolio diversification by geography, line and peril. In insurance product terms, we proved that diversification by geography, peril and line of business, which are the physical prerequisites for statistical independence, allow us to structure and price an aggregate umbrella product with a premium less than the sum of the independently priced premiums of the underlying insurance risks. In this case, unlike with the case of full dependence, big data components have a computing and accuracy function to play in the underwriting and price definition process. Once the subadditivity of the aggregate umbrella product premium as in (6.0) is established, this premium is then back-allocated to the single component risks covered by the insurance product. This is done to measure the relative riskiness of the assets under the aggregate insurance coverage and each risk individual contribution to the formation of the aggregate premium. The back-allocation procedure is described further in the article in the context of a notional micro economy case.

_{t}+ St) ≤ π(Q_{t}) + π(S_{t})*Scenario 2.2: Less Than Fully Dependent Risks*In our case study, we have geo-spatial proximity of the two insured risks in a known flood zone with measured and available averaged historical flood intensities, which leads to a measurable statistical dependence of modeled insurance loss. We express this dependence with a computed correlation factor in the interval [0 < ρ' < 1.0]. Partial dependence with a correlation factor 0 < ρ' < 1.0 has immediate impact on the theoretical standard deviation of combined modeled loss, which is a basic quantity in the formulation of risk and loading factors for premium definition.

*σ[S*This leads to redefining the equality in (4.3) to an expression of inequality between the premium of the aggregate umbrella product and the independent sum of the single risk premiums, as in the case of complete independence. (7.0)

_{t}+ Q_{t}] = √(σ^{2}[S_{t}] + σ^{2}[Qt] + 2ρ'σ[S_{t}]σ[Q_{t}]) ≤ σ[S_{t}] + σ[Q_{t}]*π(Q*The principle of premium sub-additivity (6.0), as in the case of full independence, again comes into force. The expression of this principle is not as strong with partial dependence as with full statistical independence, but we can clearly observe a theoretical ranking of aggregate umbrella premiums

_{t}+ S_{t}) = √(E[S_{t}+ Q_{t}] + σ2[S_{t}] + σ^{2}[Q_{t}] + 2ρ'σ[S_{t}] * σ[Q_{t}]) ≤ π(Q_{t}) + π(S_{t})*π(Q*in the three cases reviewed so far. (7.1)

_{t}+S_{t})*π*This theoretical ranking is further confirmed in the next section with computed numerical results. Less than full dependencies, i.e. partial dependencies among risks, could still be viewed as a statistical modeling argument for diversification in market share geography, line of business and insured peril. Partial but effective diversification still offers an opportunity for competitive premium pricing. In insurance product and portfolio terms, our study proves that partial or imperfect diversification by geography affects the sensitivity of premium accumulation and allows for cost savings in premium for aggregate umbrella products vs. the summation of multiple single-risk policy premiums.

^{Full Independence}≤ π^{Partial Dependence}≤ π^{Full Dependence}**3.0 Numerical Results of Single-Risk and Aggregate Premium Pricing Cases**In our flood risk premium study, we modeled and priced three scenarios, using classical formulas for a single risk premium in equation (1.0) and for umbrella policies in equation (7.0). In our first scenario, we price each risk separately and independently with insured limits of $90 million and $110 million. In the second and third scenarios, we price an umbrella product with a limit of $200 million, in three sub-cases with {1.0, 0.3

*and*0.0} correlation factors, respectively to represent full dependence, partial dependence and full independence of modeled insured loss. We use stochastic modeled insurance flood losses computed with high geo-spatial granularity of 30 meters. The numerical results of our experiment fully support the conclusions and guidelines that we earlier derived from theoretical statistical relationships. For fully dependent risks in close proximity, the sum of single-risk premiums approaches the price of an umbrella product, which is priced with 1.0 (100%) correlation factor. This is the stochastic relationship of full premium additivity. For partially dependent risks, the price of a combined product, modeled and priced with a 0.3 (30%) correlation factor, could be less than the sum of single-risk premiums. For fully independent risks, priced with a 0 (0.0%) correlation factor, the price of the combined insurance cover will further decrease to the price of an umbrella on partially dependent risks (30% correlation). Partial dependence and full independence support the stochastic ordering principle of premium sub-additivity. The premium ranking relationship in (7.1) is strongly confirmed by these numerical pricing results.

*Less than full dependence among risks, which is a very likely and practical measurement in real insurance umbrella coverage products, could still be viewed as the statistical modeling argument for diversification in market share geography. Partial and incomplete dependence theoretically and numerically supports the argument that partial but effective diversification offers an opportunity for competitive premium pricing.*

**See also: How Quote Data Can Optimize Pricing**