2011 - Predicting CO2 Emisions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Predicting CO 2 Emissions: A Neural Network Approach Dr. Rao K.

Bhogeswara Missouri State University Computer Science Department 901 South National Avenue Springfield, MO 65897 417-836-4157 [email protected] Dr. Randall S. Sexton Missouri State University Computer Information Systems Department 901 South National Avenue Springfield, MO 65897 417-836-6453 [email protected] Dr. Mike Hignite Missouri State University Computer Information Systems Department 901 South National Avenue Springfield, MO 65897 417-836-6893 [email protected]

Predicting CO 2 Emissions: A Neural Network Approach Abstract In this work an attempt is made to use the Neural Network Simultaneous Optimization Algorithm (NNSOA) to predict carbon dioxide emissions (CO 2 ) for different regions and

countries around the world. With concerns about global warming continuing to grow, this research attempts to not only predict CO 2 emissions but also identify relevant variables that contribute to the accurate prediction of such emissions. Being able to both predict carbon emissions and identify specific indicators of such emissions should make it possible to more effectively control CO2 emissions. The results of the research demonstrate that the NNSOA performed better than simple linear regression in predicting CO 2 emission and at the same time identifying relevant variables in the model. Keywords Neural Network, Prediction, CO 2 Emissions, Global Warming, NNSOA

1 Introduction Global warming and subsequent climate change have been the topics of a great deal of discussion for many years, with much of that recent discussion focused on mitigating the potential harmful impacts on the environment. And despite the political controversy surrounding the topic, the majority of climate scientists surveyed continue to believe that, in fact, global warming is caused by greenhouse gases produced as a consequence of industrialization (Doran et al. 2009). Greenhouse gases consists of: carbon dioxide CO 2 , nitrous oxide N 2 O, methane CH 4 and fluorinated gases and their percentages as a part of total global greenhouse gases as per 2004 are shown in Figure 1 as a pie chart (Pachauri et al. 2007). Figure 1 Global Man-made Greenhouse Gas Emissions in 2004

Source: IPCC 4th Assessment Report: Climate Change 2007: Synthesis Report As shown in the graph, a major source of greenhouse gases results from the use of fossil fuels by humans for transportation and power generation - processes that release CO 2 into the atmosphere. The consumption of fossil fuels makes up more than half of the total greenhouse gases generated in the year 2004. The generation of greenhouse gases varies across the globe based on several factors, which will be discussed later, but its effects are felt across the globe due to the release of such gases into the earths atmosphere. The top 20 countries that emitted the highest levels of CO 2 during 2006 are shown in Figure 2. As can be seen from the figure, China leads this list followed closely by the United States (Union of Concerned Scientists 2006). Figure 2 Global Man-made Greenhouse Gas Emissions in 2004

Source: http://www.ucsusa.org/global_warming/science_and_impacts/science/graph-showingeach-countrys.html

1.1 Causes of CO 2 emissions According to the U.S. EPA website (Environmental Protection Agency 2010) and as quoted, the causes of greenhouse gas emissions are stated as follows: Carbon Dioxide (CO 2 ): Carbon dioxide enters the atmosphere through the burning of fossil fuels (oil, natural gas, and coal), solid waste, trees and wood products, and also as a result of other chemical reactions (e.g., manufacture of cement). Carbon dioxide is also removed from the atmosphere (or sequestered) when it is absorbed by plants as part of the biological carbon cycle. Methane (CH 4 ): Methane is emitted during the production and transport of coal, natural gas, and oil. Methane emissions also result from livestock and other agricultural practices and by the decay of organic waste in municipal solid waste landfills. Nitrous Oxide (N 2 O): Nitrous oxide is emitted during agricultural and industrial Fluorinated Gases: Hydro fluorocarbons, per fluorocarbons, and sulfur hexafluoride are activities, as well as during combustion of fossil fuels and solid waste. synthetic, powerful greenhouse gases that are emitted from a variety of industrial processes. Fluorinated gases are sometimes used as substitutes for ozone-depleting substances (i.e., CFCs, HCFCs, and halogens). These gases are typically emitted in smaller quantities, but because they are potent greenhouse gases, they are sometimes referred to as High Global Warming Potential gases (High GWP gases). 1.2 Impact of CO 2 emissions It is believed that CO 2 emissions, along with other greenhouse gases, cause the earths atmosphere to warm resulting in changes to the earths climate [2]. For example, sea levels may rise as North and South Pole ice caps melt, leading to increased coastal flooding. Excess CO 2 can also dissolve in ocean water and form carbonic acid (H 2 CO 3 ) raising the level of acidity of the water. This increase in acidity would then cause imbalances for marine life in terms of nutrient levels found in their food. Increased water acidity could also dissolve the calcium carbonate used by marine life to form coral reefs thus damaging or destroying that marine habitat. Global warming due to CO 2 emissions also has the potential to change weather patterns across the globe resulting in prolonged periods of drought and resulting famine. Given the potentially overwhelming negative impact of CO2 emissions on the environment, it would appear critical that we understand, identify and predict such emissions with accuracy.

1.3 Costs of CO 2 emissions The costs associated with high levels of CO 2 emissions can be measured in terms of the impact on the climate. The economic costs for the U.S due to climate change resulting from global warming is estimated to be as high as 3.6% of the gross domestic product (GDP) (Ackerman et al. 2008). The costs due to the four global warming phenomena of hurricanes, real estate losses, energy costs and water costs are estimated to be around $1.9 trillion annually in todays dollars; some 1.8% of GDP by 2100. Table 1 provides these estimated costs along with the regions of the country most likely affected. Table 1 Economic Costs of Climate Change if Global Warming is Unchecked In billions of 2006 dollars As a percentage of GDP U.S. Regions Most at Risk 2025 2050 2075 2100 Hurricanes Real Estate EnergySector Water Costs $200 $336 $565 $950 1.00% 0.98% 0.95% 0.93% $10 $34 $28 $43 $80 $47 $142 $422 $173 $360 $82 $141 2025 0.05% 0.17% 0.14% 2050 0.12% 0.23% .014% 2075 0.24% 0.29% 0.14% 2100 0.41% 0.35% 0.14% Atlantic and Gulf Coast Atlantic and Gulf Coast Southeast and Southwest Western

SUBTOTAL $271 $506 $961 $1,873 1.36% 1.47% 1.62% 1.84% Source: Ackerman F, Stanton EA. The cost of climate change. May 2008, Natural Resources Defense Council Despite these estimates it is difficult to put a price tag on the many associated costs of climate change: the potential loss of human life and health, species extinction, loss of unique ecosystems, increased social conflict, and other impacts, extend far beyond any monetary measure. But by measuring the economic damage of global warming in the United States, we can begin to understand the magnitude of the challenges faced if no efforts are made to reduce contributions to climate change. Curbing CO2 and other such emissions will require a substantial investment,

but the cost of doing nothing will be far greater. Immediate action can save lives, avoid trillions of dollars of economic damage, and provide a path to solving one of the greatest challenges of the 21st century. 2 Neural Networks This research attempts to predict CO 2 emissions for different regions and countries around the world based on the following input variables gross domestic product (GDP), manufacturing industry (IND) as a percentage of GDP, service industry (SRV) as percentage of GDP, trade (TRD) as a percentage of GDP and adult illiteracy rate (ILR). In order to do this, we will utilize a neural network (NN), which has been found to be successful prediction tool for business problems as well as many other fields, such as technology, medicine, agriculture, engineering, and education. A simple Internet search using ArticleFirst produced over 11,000 articles on NNs. The NN used in this study incorporates the Neural Network Simultaneous Optimization Algorithm (NNSOA), a modified genetic algorithm, as its search technique. The Genetic Algorithm (GA) that was used as the base algorithm for the NNSOA, has been shown in comparisons with gradient search algorithms (variations of back-propagation) to outperform them in computer-generated problems as well as several real-world classification problems (Gupta et al. 2000; Sexton et al. 1998). Modifications of the GA were made improving the algorithms ability to generalize to data in which it was not trained as well as giving it the ability to recognize relevant versus irrelevant input variables (NNSOA). This has the advantage of giving the researcher or manager additional information about the problem itself. In our case, we will be able to see which of the inputs we included in our data set that are actually helping predict CO 2 emissions. By doing so, we are one step closer to solving the global warming problem. The NNSOA was shown to outperform the GA, in which it was based, as well as several back-propagation variations (Sexton et al. 2002). Also, included in the NNSOA is the automatic determination of the optimal number of hidden nodes to include in the NN architecture. This feature alone saved us considerable time and effort from trial-and-error techniques usually employed by other NN programs to find optimal architectures.

2.1 Generalization By using a search algorithm that identifies unneeded weights in a solution, this solution can then be applied to out-of-sample data with confidence that additional error cannot be introduced in the estimates. In current NN practices, specifically NNs trained using back-propagation (BP), every available input is included into the model that has the possibility of contributing to the prediction. While this method can result in fairly good models it has some obvious limitations. During the BP training process, if the connections (or weights) are not actually needed for prediction, the NN is required by its derivative nature to find nonzero weights that will essentially zero each other out for the training data. However, once this solution is applied to data that it has not seen in the training data set (out-of-sample), the unneeded weights are likely to not zero each other out and therefore add additional error to the estimate. This is a generalization problem. By using a search algorithm that is not based on derivatives, such as the NNSOA, we are allowed to have weights in our model that are hard zeros as well as modifying the objective function to add a penalty for every weight that is not a hard zero. In actuality, weights are really never removed only replaced with hard zeros, which in effect, removes them from the solution. In doing so, when applied to any data, whether training or testing there can be no net effect on the estimates. With the NNSOA, weights can be added and removed automatically at each stage of the optimization process. As weights are added or eliminated during the optimization process, discontinuities are introduced into the objective function. This precludes using search algorithms that require a differentiable objective function and, in fact, preclude most standard hill climbing algorithms. Previous studies have explored using gradient techniques that allow some of the weights to decay to zero or which reduce the size of the network during training (Baum et al 1989; Burkitt AN 1991). These methods were found to have limited usefulness. Another alternative is to remove active weights or hidden nodes and then evaluate the impact. This method of weight reduction is basically trial-and-error and requires the user to retrain after every modification to the network. The NNSOA on the other hand is based on the genetic algorithm, which does not require a differentiable objective function and can handle discontinuities such as penalty value for each nonzero weight in the solution. The improvement of generalization has been the topic of much research (Drucker et al. 1992; Karmin ED 1990; Kruschke JK 1989).

2.2 Identification of relevant inputs An additional benefit of being able to set unneeded weights to zero is the identification of relevant inputs in the NN model. After a solution has been found, an examination of these weights can be conducted in order to determine if any of the input variables have all of its weights set to zero. If a particular input has all of its input weights set to zero, we can conclude that this variable is irrelevant to the NN model since it will have no effect on the estimate. This is not to say the input has no relevant information. It just means that the NN found, for this particular solution, to have no usefulness for this input in helping with predicting the output. This could mean two things. First, the variable has no value in predicting the output. In this case, if several different runs were conducted (changing the random seed to initialize the networks starting points) it is likely that this input would be identified as irrelevant every time. The second case is not as clear as to the inputs relevancy. After several runs the input may or may not be included in the final solution. In this case it is likely and makes intuitive sense that the information contained in this input may be duplicated in other input variables. For example, let us say Inputs 1 and 2 have some of the same information contained in them. In one NN run, Input 1 is eliminated from the model. However, in the second run Input 2 is eliminated. A third run might include both variables as relevant, where it captured some of the relevant information from both variables. In either case, more information is gathered by this method, which gives the researcher or manager a better understanding of the problem. By determining the relevant inputs in the model, a manager can now have a better understanding of the problem and will be better equipped in making decisions. Section 3 describes the NNSOA. This is followed by the Monte Carlo study, results and conclusions.

3 The neural network simultaneous optimization algorithm The following is a simple outline of the NNSOA. The NNSOA is used only to search for the input weights. Prior research has found that using ordinary least squares (OLS) for determining the output weights is more efficient and effective (Sexton et al. 2003). A formal description of the basic GA algorithm can be found in Dorsey et al. (1997). Unlike back-propagation (BP), which moves from one point to another based on gradient information, the NNSOA simultaneously searches in many directions, which enhances the probability of finding the global optimum. The following is an outline of the NNSOA used in this study.

3.1 The NNSOA outline 3.1.1 Initialization A population of 12 solutions will be created by drawing random real values from a uniform distribution [1, 1] for input weights. This will happen only once during the training process. The output weights are determined by OLS.

3.1.2 Evaluation Each member of the current population is evaluated by an objective function based on their sumof squared error (SSE) value in order to assign each solution a probability for being redrawn in the next generation. In order to search for a parsimonious solution, a penalty value is added to the SSE for each nonzero weight (or active connection). The following equation shows the objective function used in this study:

Here N is the number of observations in the data set, Y the observed value of the dependent variable, the NN estimate, and C the number of nonzero weights in the network. The penalty

for keeping an additional weight varies during the search and is equal to the current value of the Root Mean Squared Error (RMSE). Based on this objective function each of the 12 solutions in the population is evaluated. The probability of being drawn in the next generation is calculated by dividing the distance of the current solutions objective value from the worst objective value in the generation by the sum of all distances in the current generation.

3.1.3 Reproduction Selecting solutions from the current population based on their assigned probability creates a mating pool of 12 solutions. This is repeated until the entire new generation, containing 12 solutions, is drawn. This new generation only contains solutions that were in the previous generation. The only difference in the new generation and the old generation is that some of the solutions (the ones with higher probabilities) may appear more than once and the poorer solutions (the ones with lower probabilities) may not appear at all.

3.1.4 Crossover Once reproduction occurs giving us some combination of solutions from the previous generation, the solutions are then randomly paired constructing 6 sets of parent solutions. A point is randomly selected for each pair of solutions in the range of [1,w), where w is the number of weights in a solution. For example if there were 36 weights in a solution, a number would randomly be drawn between 1 and 36. Once this point is selected, the paired solutions will then switch weights between them at that point and above. For example, if the random point turned out to be 15, then the paired solutions would switch weights 1 through 15. This is done for each of the 6 paired solutions, creating 12 new solutions.

3.1.5 Mutation For each weight in the population a random number is drawn, if the random value is less than .05, the weight will be replaced by a randomly drawn value in the entire weight space. By doing this, the entire weight space is globally searched, thus enhancing the algorithms ability to find global solutions or at least the global valley.

3.1.6 Mutation 2 For each weight in a generation a random number is drawn, if the random value is less than .05, a hard zero will replace the weight. By doing this, unneeded weights are identified as the search continues for the optimum solution. After this operator is performed, this new generation of 12 solutions begins again with evaluation and the cycle continues until it reaches 70% of the maximum set of generations.

3.1.7 Convergence enhancement Once 70% of the maximum set of generations has been completed, the best solution so far replaces all the strings in the current generation. Each weight in the population of strings is varied by a small random amount. These random amounts decrease to an arbitrarily small amount as the number of generations increase to its set maximum. 3.1.8 Termination The algorithm will terminate on a user specified number of generations.

4 Hidden node search The number of hidden nodes included in each NN is automatically determined in the following manner. Each NN begins with 1 hidden node and trained for a user-defined set of generations or MAXHID. After every MAXHID generations, the best solution at that point is saved as the BEST solution and an additional hidden node is included into the NN architecture. The NN is reinitialized by using a different random seed for drawing the initial weights and trained again for MAXHID generations. The BEST solution is also included in this new generation by replacing the first solution with its weights. Since an additional hidden node creates more weights than is found in the BEST solution, these weights will be set to hard zeros. This way, we keep what we have learned so far from previous generations. Upon completion of this training the best solution for this architecture is compared with the BEST solution. If this solution is better than the BEST solution, it now becomes the BEST solution and is saved for future evaluation. This process continues until a hidden node addition finds no solution better than the BEST solution. Once this occurs, the BEST solution and its corresponding architecture are trained with an additional user defined number of generations or MAXGEN, which completes the training process. Although two solutions could achieve the same value for the objective function, they may differ in their architecture. Dorsey et al (1994) demonstrated that the NN could have a variety of structures that will reduce to the same equivalent structure.

5 CO 2 emission problem and experiment In recent years carbon emissions and global climate change have become the topics of much discussion. People have become more aware of the arguments put forth by scientists that carbon emissions have caused a warming of our environment and that there could be catastrophic consequences if emissions continue unabated. Some have even gone so far as to blame recent devastating weather events on this global climate change. In addition to having a focus on carbon dioxide emissions, this paper differs from other studies by attempting to formulate a fuller, more explanatory model. In the past, most studies have focused mainly on what have been called reduced form equations, which essentially use income in terms of gross domestic product (GDP) as the only explanatory variable in the model. This paper includes additional variables which

attempt to better account for the theoretical reasons relationship between emissions levels and income. As a country develops, it will tend to shift away from its initial phase of being an agriculturally based economy to becoming more of an industrialized economy. It is assumed, quite logically, that the growing industrial sector will pollute more heavily than the agricultural sector that it has overtaken. Then, as the country continues to develop, another shift occurs that moves the economy toward being more service oriented. This final shift provides an explanation for why the relationship between environmental degradation and income might become negative at a certain level of income. This is because the growing service sector will tend to be cleaner than the industrial sector. Another factor that may contribute to the relationship between environmental degradation and income is the countrys involvement in international trade. International trade will allow rich countries to export their dirty industries to poorer countries which tend to have more lenient regulations and fewer resources for enforcing regulations. Dasgupta, et al. (2002) emphasizes a different effect that trade openness can have. They point out that being open to trade would make the importation of cleaner technologies more feasible due to increased access to technology and the higher levels of income which typically accompany higher levels of trade. If this effect is dominant, it would mean that a negative relationship would exist between trade openness and environmental degradation which is measured in terms of CO 2 emissions. Education and environmental awareness are also likely to be important factors affecting the level of environmental degradation. This is pointed out by Dasgupta, et al. (2002) and Hill and Magnani (2002). Here, the idea is that people will only care about reducing pollution if they are aware of its harmful effects. So, as education levels rise, people are expected to better understand how different types of pollution can affect their environment, and ultimately their health and well-being. This factor may be especially important when dealing with carbon dioxide emissions because their harmful effects are neither immediate nor local, thus people must become aware of them by being educated rather than from direct experience. At very low income levels, people will be primarily concerned with survival and will likely accept an increase in income, even if it means that the increase came at the cost of damaging the environment. At higher levels of income, the vast majority of the population will enjoy household incomes that are high enough that they do not have to be concerned with survival.

When this is the case, people are more likely to be concerned with environmental quality. What this basically means is that environmental quality is assumed to be a normal good. This study analyzes a panel data set of 113 countries for the period of 1990 to 2001. The data were obtained from the World Bank organizations World Development Indicators data set. This will be interesting to look at because of the different data set being used, which is more recent than the data employed by many of the studies discussed above. The dependent variable in all of these models, labeled CO 2, is the level of carbon dioxide emissions per capita. This figure is measured in metric tons per capita and measures the emissions that are the result of activities such as the burning of fossil fuels and the manufacturing of concrete. The income figures in the models are all transformations of per capita GDP which are measured in constant 2000 US Dollars. The next two independent variables in the model, labeled IND and SERV are the value of industry as a percentage of GDP and the value of services as a percentage of GDP, respectively. These are included in order to capture the effects that sectored shifts are expected to have on carbon dioxide emissions, which were discussed in the previous section. Theory would suggest that the coefficient estimate for industrys share of income will be positive because a larger industrial sector should lead to more pollution. Services as a percentage of GDP, on the other hand, is expected to have a negative relationship with emissions since the service sector should be relatively cleaner. A variable for openness to trade, labeled TRD, is also included as an explanatory variable in many of the models. This variable is the value of imports and exports as a percentage of each countrys GDP. As noted in the discussion of theory, this explanatory variable could have several different effects on carbon dioxide emissions. Being open to trade could mean that rich countries might export their dirty industries to poorer countries rather than reducing emissions. If this effect dominates, the relationship will appear to be ambiguous. If the effect of abatement technologies becoming cheaper and more available dominates, a negative relationship between openness and emissions should be observed. Another possibility is that more trade could mean more importation of fossil fuels and automobiles which would likely lead to a positive relationship with emissions. The adult illiteracy rate, ILR, is also included in the models to serve as a proxy for the overall education level of each country. This figure was chosen because it is available for a larger number of countries than many other measures of education. Also, the adult illiteracy rate,

unlike school enrollment or education expenditure measures, provides insight into the education level of the adult population, rather than that of the school-aged population. One serious drawback to using this measure is that the countries for which it is unavailable tend to be countries that have either very high or very low levels of income (Azomahou et al. 2006). These inputs are shown in Table 2a with coding of different regions in the world in Table 2b and country codes in Table 2c. Assigning different values to regions and countries would enable us to do sensitivity analysis by different regions and countries to compare and contrast CO 2 emissions in various parts of the world. Table 2a Neural Network Inputs Input 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Input Abb. GDP IND SRV TRD ILR rg1 rg2 rg3 rg4 cc1 cc2 cc3 cc4 cc5 cc6 cc7 Inputs 10 thru 16 code countries of the world as shown below Inputs 6 thru 7 code regions of the world as shown below Input description Gross Domestic Product Value of manufacturing industry as a % of GDP Value of service industry as a % of GDP Value of service industry as a % of GDP Adult illiteracy rate

Table 2b NN Region Codes Region Region Alpha Code Dummy East Asia Europe and West Asia Latin America and South America Middle East and North Africa North America South Asia Sub Saharan South and East Asia NA SOA SSA SEA 0 1 0 NA 1 1 1 NA 0 0 1 NA 0 1 0 NA MENA 0 0 1 1 EAS EUR LSA rg1 1 0 0 rg2 0 0 0 rg3 0 0 1 rg4 0 1 0

The number of observations collected totaled 1356. Out of the 1356 observations, a ten-fold cross validation was conducted in order to add rigor to our study. We made 10 training and 10 corresponding test sets out of the 1356 observations. We did this by first randomizing the order of the observations and then taking off the last 136 observations and saving them into a test file. The remaining 1220 observations were saved into a training file. To make the next training and test files, we put the 136 test observations from the previous data set and put them at the top of the training observations from the previous data set. We then took off the last 136 observations and saved them as the second test file and also saving the remaining 1220 observations as the second training file. We did this for 9 data sets and on the 10th data set we had to change the number in the training and testing sets because the total number of observations was not divisible by ten and we wanted to make sure that every observation appeared one time in a test set. The last training and testing sets included 1224 for training observations and 132 observations for testing. To additional rigor to our study we included other analysis techniques using the exact same data sets for comparison against the NNSOA. These included a backpropagation trained NN and

linear regression. All three methods were compared using RMSE. The NNSOA was written in FORTRAN with a Visual Basic interface. The backpropagation software used in this study included the NeuroShell Predictor software by Ward Systems Group, Inc. SPSS software was used to conduct linear regression. All methods were run on a 2.0 GHz Intel Core 2 Duo machine running the Windows XP operating system.

5.1 Transformation of data Before the collected data is randomized and is made ready for ten-fold cross validation, each of the input values are transformed to either to be in the range [0, 1] or normalize using mean of the each input and standard deviation. To be in the range [0, 1], for each input original values are divided by maximum value of that input. Output values are not transformed since they are not too big but within the range of [0, 35].

5.2 Training with the NNSOA With this approach, NNSOA algorithm is allowed to find optimal number of hidden nodes automatically by starting with 1 hidden node, searching 100 generation (MAXHID). At the end of the first MAXHID generations, the best solution so far is saved and an additional hidden node is added to the network. The algorithm will then train another MAXHID generations and when complete will compare the best solution for the current structure with the previous. This process repeats until the current structure concludes with its best solutions being inferior to the previous, at which time, the network reverts to the previous structure (number of hidden nodes) and trains for 10,000 generations (MAXGEN).

5.3 Training the Backpropagation software NeuroShell Predictor The NeuroShell Predictor software (NS) was used on the same 10 training and 10 testing sets. The maximum allowed number of hidden nodes to include for this algorithm was set to 150. Once the training completed for each of the 10 training sets, the found solution was applied to its corresponding test set. 5.4 Training with linear regression

Simple linear regression was also used in this study. Other methods such as general linear method with different functions of inputs (example: with squares of GDP input data) are not attempted to find better performance using the regression technique. It is possible to find a better convergent solution using other regression methods that are more suitable to the data set in consideration such as general or generalized liner regression method with inputs that are not linear and parameters that are nor linear but those methods are not attempted in this work.

6 Results 6.1 Importance of data preparation Values in certain inputs are three to four orders of magnitude different from certain other inputs due to the nature of input variables selected: GDP, IND, SRV, TRD and ILR where indicator variable for regions and countries is either zero or one in their corresponding inputs: rg1, rg2, rg3 and rg4 for regions and cc1, cc2, cc3, cc4, cc5, cc6 and cc7. These differences magnitudes of data would also make convergence of solution harder and hence the accurate prediction of CO 2. Also, since a NN needs to have a representative sample for training to pick up the underlying function, the data was randomly sorted before making the 10 training and 10 corresponding testing sets.

6.2 RMSE Error Comparison Table 3 includes the RMSE error values for the testing data for all 10 runs, including the average and standard deviation between runs. Table 3 Neural Net (NN), NeuroShell (NS), Simple Linear Regression (SLR) Error Values

Data Run 1 2 3 4 5 6 NNSOA RMSE NS RMSE SLR RMSE 0.6494 0.6904 0.8877 0.9164 1.0789 0.6883 1.2363 1.3678 1.3666 1.4715 1.7181 1.0130 2.2499 2.1608 3.0804 2.8759 3.1957 2.1125

7 8 9 10 Average Std Dev

1.3023 0.9617 0.9762 0.7813 0.8933 0.2025

1.8054 1.6121 1.5397 1.6030 1.4733 0.2360

3.0313 2.5481 3.3231 3.0088 2.7587 0.4524

Prediction of CO 2 is also compared using simple linear regression (referred to as SLR from now on) and neural network methods. For the NNSOA method, the average RMSE for testing is 0.8933, while that of NS and SLR are 1.4733 and 2.7587, which is much higher. One possible reason why the NNSOA outperformed the backpropagation NN could be its ability to eliminate unnecessary weights in the solution. Backpropagation, which is based on derivatives, requires weights in a solution to be nonzero. Therefore, any weights that are unnecessary have to find other unnecessary weights in that same solution to try and zero out each others contribution to the NN estimate. By doing so, no additional error is introduced into the estimates produced for the training set. However, once the solution is applied to the testing set, these weights will unlikely continue to zero out, adding additional error to the estimate. The following table demonstrates the topology of the networks. Table 4 Network Topology Data Run 1 2 3 4 5 6 7 8 9 NNSOA NS Hidden 66 66 66 66 60 66 84 132 66 90 113 142 93 121 143 136 145 134 SLR NA NA NA NA NA NA NA NA NA NNSOA NS 383 285 391 281 296 387 406 397 346 1530 1921 2414 1581 2057 2431 2312 2465 2278 SLR 16 16 16 16 16 16 16 16 16

Hidden Hidden Weights Weights Weights

10 Average Std Dev

66 73.80 21.36

61 117.80 28.29

NA NA NA

350 352.20 48.71

1037 2002.60 480.89

16 16.00 0.00

The following figure 3 compares predictive capability of different methods by comparing real (observed) values of CO2 emissions to those predicted by different methods. This graph indicates prediction of NNSOA method is better than other regression and backpropagation methods. Figure 3 Observed vs. Predicted CO 2 Emissions

Figure 4 shows how NNSOA method predicts CO2 emissions in comparison to observed values with R-squared value of 0.9608.

Figure 4 NNSOA CO 2 Prediction vs. Observed

When SLR method is used, as shown in the figure 5 below, R-squared value is 0.6459 with less accurate prediction. Figure 5 SLR CO 2 Prediction vs. Observed

Although the NNSOA has the ability to identify irrelevant inputs by searching and finding solutions that zero out all connections to a specific input, for this dataset it was found in all 10 runs that every input was used for prediction

6.3 Sensitivity analysis The following four figures describe sensitivity analysis of CO 2 emissions to input variables. As shown in figure 6, GDP is the dominant input variable and CO 2 emissions increase as GDP increases as discussed before. CO 2 emissions also increase with increase in input variables IND where as they decrease with increase in SRV, TRD and ILR. Figure 6 CO 2 Emissions and GDP

As shown in below, Latin and South America (LSA) region produces more CO 2 emissions than North America (NA) region for average values for GDP, IND, SRV, TRD and ILR. While Europe and West Asia (EUR) region has the lowest CO2 emissions, Sub Saharan (SSA), South Asia (SOA) and Middle East and North Africa (MENA) regions are comparable to North America (NA) region.

Figure 7 CO2 Emissions by Region

Bottom 10 and top 10 countries in CO 2 emission efficiency per capita compared to the U.S.A. are shown in the following two figures for average values of GDP, IND, SRV, TRD and ILR. Figure 8 Bottom Ten Countries in Terms of CO 2 Emission Efficiency

Figure 9 Top Ten Countries in Terms of CO 2 Emission Efficiency

Though China is the number one producer of total CO 2 emissions in aggregate, since it is also the most populated country in the world, its per capita CO 2 emissions are low and hence its CO 2 emission efficiency is considered high. The following figure describes how CO 2 emissions in China vary with regard to the variables GDP, IND, SRV, TRD and ILR. CO 2 emissions in China increase with GDP and IND, with IND being the most predominant input variable where as emissions decrease as ILR increases. CO 2 emissions in china also show a very weak positive relationship with SRV and TRD. Higher ILR indicates low CO2 emissions due to less usage of energy sources that produce CO2 emissions.

Figure 10 CO 2 Emissions in China: Sensitivity by Input Variable

7 Conclusions CO2 emissions by countries around the world are a major source of current global warming phenomena that has severe consequences to human life as we know it. This research work makes an attempt to identify the macro-economic input variables that have an impact in predictions of CO2 emissions. These input variables are identified as gross domestic product (GDP), manufacturing industry (IND) as a percentage of GDP, service industry (SRV) as percentage of GDP, trade (TRD) as a percentage of GDP and adult illiteracy rate (ILR). CO2 emissions are predicted using three different methods: neural networks, backpropagation and simple linear regression. This work shows that neural network method predicts CO2 emission more accurately than other two methods. It also identifies that all input variables: GDP, IND, SRV, TRD and ILR, as significant in explaining the dependent variable, CO2 emissions. R-squared value between CO2observed and predicted using neural network method is about 0.96 indicating a good fit between observed and predicted values of CO2 emissions. Sensitivity analysis of input variables indicates that input variables GDP and IND as the predominant variable that cause increase in CO2 emissions and is in agreement with what was hypothesized in earlier sections of this paper. One of the input variables is insignificant. Sensitivity analysis of different regions indicated that the all regions around the world contribute

CO2 emissions significantly. But CO2 emissions from Europe are much lower than other regions. This may be due to the fact that Europe is very progressive in recognizing the global warming problem due to CO2 emissions and subsequent efforts to curb these emissions. Sensitivity analysis also indicates that some countries are better than U.S.A. in CO2 emission efficiency and others are worse than U.S.A. This indicates that some countries may need to put more effort than others in order to address global warming due to CO2 emissions to have a meaningful impact in curbing climate change and subsequent human suffering due to global warming. Sensitivity analysis of CO2 emissions by China indicates its recent increased activity in manufacturing and industrialization has significant impact in increasing CO2 emissions by this country. As the economy in China grows further due to further increased activity in manufacturing and industrialization, China should be cognizant of its impact on global warming and should take steps curb CO2 emissions by embracing Green technologies such as wind, solar and nuclear to produce energy. It is shown that the neural network method is a robust method to predict CO2 emissions when input variable are GDP, IND, SRV, TRD and ILR for different regions and countries of the world and that this method performs better than simple linear regression. Table 2c NN Country codes Country Country Code 113 total ALB DZA ATG ARG AUT BGD BEL BLZ Albania Algeria Antigua and Barbuda Argentina Austria Bangladesh Belgium Belize 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 2 0 0 0 0 1 1 3 0 0 0 1 0 0 4 0 0 0 1 0 1 5 0 0 0 1 1 0 6 0 0 0 1 1 1 7 Country Name Binary Code Country Numerical code

BEN BTN BOL BWA BRA BGR BFA BDI CMR CPV CAF TCD CHL CHN COL COM ZAR COG CRI CIV DNK DMA DOM ECU EGY ETH FIN FRA GAB GMB GHA

Benin Bhutan Bolivia Botswana Brazil Bulgaria Burkina Faso Burundi Cameroon Cape Verde Central African Republic Chad Chile China Colombia Comoros Congo, Dem. Rep. Congo, Rep. Costa Rica Cote d'Ivoire Denmark Dominica Dominican Republic Ecuador Egypt, Arab Rep. Ethiopia Finland France Gabon Gambia, The Ghana

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 0 8 0 0 1 0 0 1 9 0 0 1 0 1 0 10 0 0 1 0 1 1 11 0 0 1 1 0 0 12 0 0 1 1 0 1 13 0 0 1 1 1 0 14 0 0 1 1 1 1 15 0 1 0 0 0 0 16 0 1 0 0 0 1 17 0 1 0 0 1 0 18 0 1 0 0 1 1 19 0 1 0 1 0 0 20 0 1 0 1 0 1 21 0 1 0 1 1 0 22 0 1 0 1 1 1 23 0 1 1 0 0 0 24 0 1 1 0 0 1 25 0 1 1 0 1 0 26 0 1 1 0 1 1 27 0 1 1 1 0 0 28 0 1 1 1 0 1 29 0 1 1 1 1 0 30 0 1 1 1 1 1 31 1 0 0 0 0 0 32 1 0 0 0 0 1 33 1 0 0 0 1 0 34 1 0 0 0 1 1 35 1 0 0 1 0 0 36 1 0 0 1 0 1 37 1 0 0 1 1 0 38

GRC GRD GTM GIN GNB GUY HND HKG HUN ISL IND IDN IRN IRL ITA JAM JPN JOR KEN KOR LAO LUX MDG MWI MYS MLI MRT MUS MEX MNG MAR

Greece Grenada Guatemala Guinea Guinea-Bissau Guyana Honduras Hong Kong, China Hungary Iceland India Indonesia Iran, Islamic Rep. Ireland Italy Jamaica Japan Jordan Kenya Korea, Rep. Lao PDR Luxembourg Madagascar Malawi Malaysia Mali Mauritania Mauritius Mexico Mongolia Morocco

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1

1 0 0 1 1 1 39 1 0 1 0 0 0 40 1 0 1 0 0 1 41 1 0 1 0 1 0 42 1 0 1 0 1 1 43 1 0 1 1 0 0 44 1 0 1 1 0 1 45 1 0 1 1 1 0 46 1 0 1 1 1 1 47 1 1 0 0 0 0 48 1 1 0 0 0 1 49 1 1 0 0 1 0 50 1 1 0 0 1 1 51 1 1 0 1 0 0 52 1 1 0 1 0 1 53 1 1 0 1 1 0 54 1 1 0 1 1 1 55 1 1 1 0 0 0 56 1 1 1 0 0 1 57 1 1 1 0 1 0 58 1 1 1 0 1 1 59 1 1 1 1 0 0 60 1 1 1 1 0 1 61 1 1 1 1 1 0 62 1 1 1 1 1 1 63 0 0 0 0 0 0 64 0 0 0 0 0 1 65 0 0 0 0 1 0 66 0 0 0 0 1 1 67 0 0 0 1 0 0 68 0 0 0 1 0 1 69

MOZ NPL NLD NIC NER NGA NOR OMN PAK PAN PRY PER PHL PRT RWA STP SAU SEN SYC ZAF ESP LKA KNA LCA VCT SUR SWZ SWE SYR THA

Mozambique Nepal Netherlands Nicaragua Niger Nigeria Norway Oman Pakistan Panama Paraguay Peru Philippines Portugal Rwanda Sao Tome and Principe Saudi Arabia Senegal Seychelles South Africa Spain Sri Lanka St. Kitts and Nevis St. Lucia St. Vincent and the Grenadines Suriname Swaziland Sweden Syrian Arab Republic Thailand

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 1 1 0 70 0 0 0 1 1 1 71 0 0 1 0 0 0 72 0 0 1 0 0 1 73 0 0 1 0 1 0 74 0 0 1 0 1 1 75 0 0 1 1 0 0 76 0 0 1 1 0 1 77 0 0 1 1 1 0 78 0 0 1 1 1 1 79 0 1 0 0 0 0 80 0 1 0 0 0 1 81 0 1 0 0 1 0 82 0 1 0 0 1 1 83 0 1 0 1 0 0 84 0 1 0 1 0 1 85 0 1 0 1 1 0 86 0 1 0 1 1 1 87 0 1 1 0 0 0 88 0 1 1 0 0 1 89 0 1 1 0 1 0 90 0 1 1 0 1 1 91 0 1 1 1 0 0 92 0 1 1 1 0 1 93 0 1 1 1 1 0 94 0 1 1 1 1 1 95 1 0 0 0 0 0 96 1 0 0 0 0 1 97 1 0 0 0 1 0 98 1 0 0 0 1 1 99

TGO TTO TUN TUR UGA ARE GBR USA URY VEN VNM ZMB ZWE

Togo Trinidad and Tobago Tunisia Turkey Uganda United Arab Emirates United Kingdom United States Uruguay Venezuela, RB Vietnam Zambia Zimbabwe

1 1 1 1 1 1 1 1 1 1 1 1 1

1 0 0 1 0 0 100 1 0 0 1 0 1 101 1 0 0 1 1 0 102 1 0 0 1 1 1 103 1 0 1 0 0 0 104 1 0 1 0 0 1 105 1 0 1 0 1 0 106 1 0 1 0 1 1 107 1 0 1 1 0 0 108 1 0 1 1 0 1 109 1 0 1 1 1 0 110 1 0 1 1 1 1 111 1 1 0 0 0 0 112

References Ackerman F, Stanton EA (2008) The cost of climate change. May 2008, Natural Resources Defense Council Azomahou T, Laisney F, Van PN (2006) Economic development and CO2 emissions: A nonparametric panel approach. Journal of Public Economics; 90: 1347-63. Baum EB, Haussler D (1989) What size net gives valid generalization? Neural Computing; 1:151-160. Burkitt AN (1991) Optimization of the architecture of feed-forward neural nets with hidden layers by unit elimination. Complex Systems; 5:371-80. Dasgupta S, Laplante B, Wang H, Wheeler D (2002) Confronting the environmental kuznets curve. Journal of Economic Perspectives; 16(1): 147-68. Doran, PT, Zimmerman, MK (2009) Examining the scientific consensus on climate change. EOS-Weekly Newspaper of the American Geophysical Union (AGU); 90(3): 21-2. Dorsey RE, Johnson JD, Mayer WJ (1994) A genetic algorithm for the training of feed forward neural networks. Johnson JD, Whinston AB., editors. Advances in artificial intelligence in economics, finance, and management, vol. 1. Greenwich, CT: JAI Press Inc.; p. 93-111.

Dorsey RE, Johnson JD, Van Boening MV (1994) The use of artificial neural networks for estimation of decision surfaces in first price sealed bid auctions. In: Cooper WW, Whinston AB., editors. New Direction in Computational Economics. Netherlands: Kluwer Academic Publishers; p. 19-40. Dorsey RE, Johnson JD (1997) Evolution of dynamic reconfigurable neural networks: energy surface optimality using genetic algorithms. In: Levine DS, Elsberry WR., editors. Optimality in biological and artificial networks. Hillsdale, NJ: Lawrence Earlbaum Associates; p. 185-202. Drucker H, LeCun Y (1992) Improving generalization performance using double back propagation. IEEE Transactions on Neural Networks; 3:991-7. Environmental Protection Agency (EPA) (2010) Emissions defined. http://www.epa.gov/climatechange/emissions/index.html#ggo Gupta JND, Sexton RS, Tunc EA (2000) Selecting scheduling heuristic using neural networks. INFORMS Journal on Computing; 12(2):150-62. Hill, RJ, Magnani E (2002) An exploration of the conceptual and empirical basis of the environmental kuznets curve. Australian Economic Papers; 41(2):239-54. Karmin ED (1990) A simple procedure for pruning back-propagation trained networks. IEEE Transactions on Neural Networks; 1:239-42. Kruschke JK (1989) Distributed bottlenecks for improved generalizations in back-propagation networks. International Journal of Neural Networks Research and Applications; 1:187-93. Pachauri, RK, Reisinger, A (Eds.) (2007) IPCC 4th Assessment Report: Climate Change 2007: Synthesis Report Sexton, RS, Dorsey RE, Johnson JD (1998) Toward a global optimum for neural networks: a comparison of the genetic algorithm and back propagation. Decision Support Systems; 22:171-85. Sexton RS, Dorsey RE, Sikander NA (2002) Simultaneous optimization of neural network function and architecture algorithm. Decision Support Systems; 1034. Sexton RS, Sriram RS, Etheridge H (2003) Improving decision effectiveness of artificial neural networks: a modified genetic approach. Decision Sciences; 34(3):421-42.

Union of Concerned Scientists (2006). Top 20 countries 2006 CO2 emissions. http://www.ucsusa.org/global_warming/science_and_impacts/science/graph-showing-eachcountrys.html

Copyright of Insights to a Changing World Journal is the property of Franklin Publishing Company and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

You might also like