International Seminar on the Frontiers of Data Science
2018.09.23-09.24
Opening Ceremony
Time: Sep23, 2018(Sunday) 13:00-14:00
Venue: Room 109, HongYuan Building
Time |
Speaker |
Event |
13:00-13:30 |
Registration for the Workshop |
13:30-13:40 |
GUO Jianjun (郭建军) |
Opening Speech |
13:40-13:50 |
Ostap Okhrin |
Speech |
13:50-14:00 |
Group Photo |
Parallel Sessions
Session 1
Time: Sep23, 2018(Sunday) 14:00-15:30
Venue: Room 109, HongYuan Building
Session Chair: Ostap Okhrin
Time |
Speaker |
Title |
School / Institution |
14:00-14:30 |
XIAO Feng (肖峰) |
Day-to-day Flow Dynamics for Stochastic User Equilibrium and A General Lyapunov Function Lyapunov Function |
School of Business Administration, SWUFE |
Ostap Okhrin |
Flexible HAR Model for Realized Volatility
|
Institute of Transport and Economics, TUD |
|
15:00-15:30 |
XIAO Hui (肖辉) |
Ranking and Selection with Input Uncertainty
|
School of Statistics, SWUFE |
14:30-15:00 |
Ostap Okhrin |
Flexible HAR Model for Realized Volatility
|
Institute of Transport and Economics, TUD |
15:00-15:30 |
XIAO Hui (肖辉) |
Ranking and Selection with Input Uncertainty
|
School of Statistics, SWUFE |
15:30-15:50 |
Coffee Break |
Session 2
Time: Sep23, 2018(Sunday) 15:50-17:20
Venue: Room 109, HongYuan Building
Session Chair: CHEN Xuerong (陈雪蓉)
Time |
Speaker |
Title |
School / institution |
15:50-16:20 |
Georg Hirte |
International Trade, Geographic Heterogeneity and Interregional Inequality |
Institute of Transport and Economics, TUD |
16:20-16:50 |
CHEN Xuerong (陈雪蓉) |
Integrated Powered Density: Screening Ultrahigh Dimensional Covariates with Survival Outcomes |
School of Statistics,SWUFE |
16:50-17:20 |
Regine Gerike |
Travel Behavior in Urban Areas: Data, Methods, Findings |
Institute of Transport Planning and Road Traffic,TUD |
17:30-19:30 |
Welcome Dinner |
Session 3
Time: Sep24, 2018(Monday) 9:30-10:30
Venue: Room 109, HongYuan Building
Session Chair: GUO Mengmeng (郭萌萌)
Time |
Speaker |
Title |
School / institution |
9:30-10:00 |
Bernhard Schipp
|
Time Dependent Return Distributions, Nonlinear Fokker-Planck Dynamics and the Tsallis Entropy
|
Institute of Business and Economics, TUD |
10:00-10:30 |
GUO Mengmeng (郭萌萌) |
Does Air Pollution Affect Stock Returns? Evidence from China
|
Institute of Economics and Management, SWUFE |
10:30-10:50 |
Coffee Break |
Session 4
Time: Sep24, 2018(Monday) 10:50-12:20
Venue: Room 109, HongYuan Building
Session Chair: SUN Xiuli(孙秀丽)
Time |
Speaker |
Title |
School / institution |
10:50-11:20 |
SUN Xiuli (孙秀丽) |
Firm-level Human Capital and Innovation: Evidence from China
|
School of Statistics,SWUFE |
11:20-11:50 |
ZAHNG Jia (张佳) |
High Dimensional Elliptical Sliced Inverse Regression in non-Gaussian Distributions
|
School of Statistics,SWUFE |
11:50-12:20 |
Stefanie Lösch
|
Measuring Regional Environmental Awareness by Using Internet Query Data
|
Institute of Transport and Economics, TUD |
12:20-14:00 |
Lunch Break : Liulin Restaurant |
Session 5
Time: Sep24, 2018(Monday) 14:00-15:30
Venue: Room 109, HongYuan Building
Session Chair: YANG Dong (杨冬)
Time |
Speaker |
Title |
School / institution |
14:00-14:30 |
Mr. Stephan Hocke
|
Optimize the Optimization – Parameter Tuning of a Stochastically Metaheuristic
|
Institute of Transport and Economics, TUD |
14:30-15:00 |
Mr. YANG Dong (杨冬)
|
A Misspecification Test for the Higher Order Comoments of the Factor Model
|
School of Statistics,SWUFE |
15:00-15:30 |
Mr. Manuel Schmid
|
Estimating Higher Moments with High Frequency Returns
|
Institute of Transport and Economics, TUD |
15:30-15:50 |
Coffee Break |
Session 6
Time: Sep24, 2018(Monday) 15:50-16:50
Venue: Room 109, HongYuan Building
Session Chair: Sophie Häse
Time |
Speaker |
Title |
School / institution |
15:50-16:20 |
Sophie Häse |
The Impact of Unexpected and Recurring Flooding Events on House Prices
|
Institute of Transport and Economics, TUD |
16:20-16:50 |
WANG Minke (王旻轲) |
Modelling and Solving the Location Inventory Problem with Stochastic Demand Considering Carbon Cap-and-Trade
|
School of Statistics,SWUFE |
17:30-20:30 |
Dinner in the City Center |
Title & Abstract
TUD side:
# Prof. Dr. OstapOkhrin
Titel: Flexible HAR Model for Realized Volatility
Co-Authoren: Francesco Audrino und Chen Huang (Uni St. Gallen)
Abstract: The Heterogeneous Autoregressive (HAR) model is commonly used in modeling the dynamics of realized volatility. In this paper, we propose a flexible HAR(1,...,p) specification, employing the adaptive LASSO and its statistical inference theory to see whether the lag structure (1, 5, 22) implied from an economic point of view can be recovered by statistical methods. The model differs from Audrino and Knaus (2016) where the authors apply LASSO on the AR(p) model, which does not necessarily lead to a HAR model. Adaptive LASSO estimation and the subsequent hypothesis testing results fail to show strong evidence that such a fixed lag structure can be recovered by a flexible model. We also apply the group LASSO and related tests to check the validity of the classic HAR, which is rejected in most cases. The results justify our intention to use a flexible lag structure while still keeping the HAR frame. In terms of the out-of-sample forecasting, the proposed flexible specification workscomparably to the benchmark HAR(1, 5, 22). Moreover, the time-varying model combinations show that when the market environment is not stable, the fixed lag structure (1, 5, 22) is not particularly accurate and effective.
# Prof. Dr. Georg Hirte
Title: International Trade, Geographic Heterogeneity and Interregional Inequality
Abstract: We study the effect of international trade on interregional inequality from 1992-2012 within almost all countries of the world using satellite night-light based inequality proxies. For our analysis, we develop novel indicators for within-country trade cost heterogeneities that are based on exogenous geographical features. In order to deal with potential endogeneity issues, we utilize the occurrence of large natural disasters striking trade partners as instrument to generate exogenous variation in trade flows. In contrast to previous results, our IV estimates reveal that international trade aggravates economic disparities only in those countries that have higher within-country heterogeneity in their access to the world market and their within-country trade costs.
# Prof. Dr. Regine Gerike
Title: Travel Behavior in Urban Areas: Data, Methods, Findings
Co-Author: Rico Witter
Abstract: Cross-sectional household travel surveys (HTS) are the main data source for analyzing travel behavior. HTS are traditionally based on mixed-mode data collection but increasingly innovative methods such as smartphone-based GPS-tracking are applied.This talk gives first an overview of available data sources in Germany and in the international context including examples for HTS datasets and also methods used for data collection.In the second part, insights on travel behavior with a focus on car use and the peak-car phenomenon are presented with the example of historical HTS analysis for the five European capital cities Berlin, Copenhagen, London, Paris and Vienna. The peak-car phenomenon and its drivers are described based on descriptive statistics and Age-Period-Cohort Analysis.
# Prof. Dr. Bernhard Schipp
Title:Time Dependent Return Distributions, Nonlinear Fokker-Planck Dynamics and the Tsallis Entropy
Co-Author: Sabine Hegewald
Abstract:Econometric analysis of high frequency stock market data thatis typically based on one or more Brownian motions lacks ability to controlfor abnormal, i.e. time dependent changes in the noise distribution. In thispaper, an approach originally pursued by Barndorff-Nielsen and Shephard(2000) is extended to nonlinear Fokker-Planck equations of which the Tsallisdistribution with time-varying parameters may be regarded as a particularsolution. The resulting Tsallis density is able to model even distributionswith extreme leptokurtosis in an adequate way. Additionally, relations between theTsallis distribution and GARCH(1,1)- and GJR(1,1)-models are discussed.
#Mrs. Sophie Häse
Title: The Impact of Unexpected and Recurring Flooding Events on House Prices
Abstract: We study the causal impact of an unexpected major flood event and a sequel of river floods on house prices. Previous literature mainly investigated the impact of single events (redefining floodplains, hurricanes, inundation) in the USA finding a negative but temporary impact on housing prices within floodplains (in and Landry, 2012; Atreya et al., 2013; Daniel et al., 2009). More recent studies provide heterogeneous effects (Zhang, 2016). There is also a scarce literature on the causal effects of river floods on house prices in actually inundated land parcels (Artreya and Ferreira, 2015).
We investigate the causal effects of inundation on house prices, the time pattern and heterogeneity across house types and ask whether recurrence matters. Our study area is Dresden, a German city with 540,000 inhabitants which is spread along the Elbe river. Dresden constitutes a specific case due to the unexpected major flood event in 2002 (classified as HQ500 before 2002; afterwards HQ100) and on account of subsequent events: a HQ20 flood in 2006 and a HQ50 event in 2013. Further, the housing market in Germany is not comparable to those of most countries because more than 50\% of all flats arerented. We, thus, study also the impact on prices of houses with rented flats.
We use a unique data set of all transactions on the Dresden housing market from 2000 to 2017 including also houses with rented flats. The inclusion of comprehensive geodata allows us to consider location, elevation, urban amenities and local public goods and differentiate between contrary effects like the proximity to water and the risk of flooding.
# Mrs. Stefanie Lösch
Title: Measuring Regional Environmental Awareness by Using Internet Query Data
Co-Authors: OstapOkhrin, Hans Wiesmeth
Abstract: "Global climate change will affect the Russian Federation in particular: with regions in permafrost areas, large forested areas, and an agriculture adjusted to the current climatic conditions, Russia will be confronted with consequences of the climate change on a large scale. Are citizens sufficiently aware of these challenges in order to provoke necessary support from the public administration?
In this paper, we estimate awareness indices for 81 regions and 28 month, ranging from January 2014 to April 2016, by using a Multiple-Indicator-Multiple-Causes (MIMIC) model. Dependent indicators are derived from the number of certain queries in the search engine Yandex, whereas exogenous causes of environmental awareness are assumed to be characteristics of the Russian regions. The estimated awareness time-series reveal seasonal effects, especially a high interest in environmental topics in the winter months, as well as negative correlation with the regional temperature. The estimated awareness index is larger for the regions in the cold north than in the warmer south of the country. Geographical groups with similar awareness structures are found by using k-means algorithm. Moreover, a positive dependence between the level of awareness and regional GRP per capita can be shown."
#Mr. Stephan Hocke
Title:Optimize the Optimization – Parameter Tuning of a Stochastically Metaheuristic
Abstract:Optimization problems arise in various contexts and the proven optimal solution cannot always been determined in a justifiable amount of computation time or resources, respectively. Especially for discrete problems, heuristic procedures are required. The sequence of events and solution quality of metaheuristics and artificial intelligences are not predetermined and are mostly randomly dependent. Hence, different seeds of the random generator results in different solution for the same problem. Consequently, finding the best parameter setting of a metaheuristic (e.g. genetic algorithm – recombination type/mutation probability/mutation type etc., Simulated annealing – temperature function/iterations, tabu search – acceptance probability/size of tabu list etc.) is a non-trivial task and represents an optimization problem itself. Since the literature focus on the proof of concepts and the necessary computational effort is prohibitively expensive regard to the desire publication, finding the right parameter setting plays a subordinate. Consequently, the majority provided parameter settings are arbitrary or unfounded. This paper presents an exemplary parameter tuning of a Particle Swarm Optimization (PSO) developed for the Vehicle Routing Problem with Temporal Synchronization Constraints. In the process, it should be evaluated whether there is any generalizable "best" parameter setting, or whether this depends on the problem size and / or structure.
# Mr. Manuel Schmid
Title:Estimating Higher Moments with High Frequency Returns
Co-Authors:OstapOkhrin, Michael Rockinger
Abstract: In standard return modelling approaches, returns are often assumed to follow a normal distribution. This assumption implies a zero skewness as well as a zero excess kurtosis. Both of these implications do not correspond to empirical observation and eventually lead to problems e.g. in financial risk management. On the other side, the typical non-parametric estimation of these values require a huge amount of data to be reliable. For this reason, it is advisable to exploit the availability of high frequency data and construct estimators in the fashion of the well-known realized variance. In this paper an estimation approach presented by Neuberger and Payne (2018) is extended to non-martingale price processes. On the basis of Monte Carlo simulations, we show that our estimators are unbiased and consistent when the underlying price process can be modelled as a stochastic volatility jump diffusion process. Distribution properties of the estimators are discussed
SWUFE side:
# Prof. Dr. XIAO Hui
Title:Ranking and selection with input uncertainty
Abstract: In this research, we consider the ranking and selection (R&S) problem with input uncertainty. It seeks to maximize the probability of correct selection (PCS) for the best design under a fixed simulation budget, where the performance of each design is measured by their worst-case performance. To simplify the complexity of PCS, we develop an approximated probability measure for it and derive an asymptotically optimal solution of the resulting problem. An efficient selection procedure is then designed within the optimal computing budget allocation (OCBA) framework. More importantly, we provide some useful insights on characterizing an efficient robust selection rule and how it can be achieved by adjusting the simulation budgets allocated to each scenario.
# Prof. Dr. GUO Mengmeng
Title:Does Air Pollution Affect Stock Returns? Evidence from China
Abstract:Building on research linking environmental factors to investors’ sentiments and the local bias literature, we posit that there is a negative relation between air pollution and a firm’s stock return. Consistent with our hypothesis, we find that firms located in cities that experience higher levels of air pollution exhibit lower stock returns and lower trading volumes. In line with our central hypothesis, we observe that the effect of air pollution on stock returns is stronger among firms that are more likely to be held by local investors. Our results hold across alternative measures of air pollution and are not sensitive to the location of the firm, the city size, the pollution level, and the air pollution standards. Moreover, the results remain robust after addressing the endogeneity issue by controlling for firm fundamental factors. This study is the first to establish an association between air pollution and local stock returns.
# Prof. Dr. SUN Xiuli
Title:Firm-level Human Capital and Innovation: Evidence from China
Abstract:This paper explores the role of human capital in firms’ innovation. Based on a World Bank survey of manufacturing firms in China, we use two firm-level datasets: one from the larger metropolitan cities, and one from smaller and mid-sized cities. Patents are used as an indicator of innovation. The human capital indicators we use include the number of highly educated workers, the general manager’s education and tenure, and the management team’s education and age. We use the Negative Binomial and Instrumental Variables estimators to estimate patent production function models that are augmented by our human capital variables. We also use the zero-inflated Poisson model to examine the likelihood of innovation. We find that the human capital indicators play an important role in influencing patenting, and that some of the human capital variables appear to have a greater impact on patenting in the smaller and mid-sized cities.
Our human capital estimates are obtained after controlling for firms’ R&D, size, market share, age, and foreign ownership, as well as fixed effects to control for industry-specific characteristics, and firms’ location and geography. We comment on how our findings play into China’s policies related to innovation and human capital formation.
# Prof. Dr. XIAO Feng
Title:Day-to-day Flow Dynamics for Stochastic User Equilibrium and A General Lyapunov Function
Abstract:This study establishes a general framework for continuous day-to-day models to capture the perceptual errors in travelers’ day-to-day route choice behavior. As the counterpart of the Beckmann transformation (Beckmann et al., 1956), which has been widely used as a candidate Lyapunov function to prove the stability of continuous day-to-day traffic evolution models that converge to deterministic user equilibrium (DUE), Fisk’s formulation (Fisk, 1980; Watling and Cantarella, 2013) is utilized in our study as a general Lyapunov function for the day-to-day models that converge to stochastic user equilibrium (SUE), so far as the path flow growth rates and the “potentials” of the paths satisfy the condition of negative correlation. A sufficient condition which guarantees the nonnegativity of the path flow is also provided. The logit dynamic (Sandholm, 2010), the logit-based smith dynamic (Smith and Watling, 2016) and the logit-based BNN dynamic (Brown and Von Neumann, 1950) are given as three examples under this framework. Moreover, we extend the second-order day-to-day model in Xiao et al. (2016) for SUE. Some properties of the new model, such as fixed point and stability, are investigated. Interestingly, we find that even the model converges to SUE, the path flows could still go negative during the oscillation under extreme situations. A numerical experiment is conducted to demonstrate the existence of negative path flow for the second-order model.
# Prof. Dr. CHEN Xuerong
Title: Integrated Powered Density: Screening Ultrahigh Dimensional Covariates with Survival Outcomes
Abstract: Modern biomedical studies have yielded abundant survival data with high-throughput predictors. Variable screening is a crucial rst step in analyzing such data, for the purpose of identifying predictive biomarkers,
understanding biological mechanisms and making accurate predictions. To nonparametrically quantify the relevance of each candidate variable to the survival outcome, we propose integrated powered density (IPOD), which compares the di erences in the covariate-strati ed distribution functions. The proposed new class of statistics, with a flexible weighting scheme, is general and includes the Kolmogorov statistic as a special case. Moreover, the method does not rely on rigid regression model assumptions and can be easily implemented. We show that our method possesses sure screening properties, and con rm the utility of the proposal with extensive simulation studies. We apply the method to analyze a multiple myeloma study on detecting gene signatures for cancer patients' survival.
# Mr. YANG Dong
Title:A Misspecification Test for the Higher Order Comoments of the Factor Model.
Abstract:
The traditional estimation of higher order co-moments of non-normal random variables by the sample analog of the expectation faces a curse of dimensionality, as the number of parameters increases steeply when the dimension increases. Imposing a factor structure on the process solves this problem; however, it leads to the challenging task of selecting an appropriate factor model. This paper contributes by proposing a test that exploits the following feature: when the factor model is correctly specified, the higher order co-moments of the unexplained return variation are sparse. It recommends a general to specific approach for selecting the factor model by choosing the most parsimonious specification for which the sparsity assumption is satisfied. This approach uses a Wald or Gumbel test statistic for testing the joint statistical significance of the co-moments that are zero when the factor model is correctly specified. The asymptotic distribution of the test is derived. An extensive simulation study confirms the good finite sample properties of the approach. This paper illustrates the practical usefulness of factor selection on daily returns of random subsets of S&P 100 constituents.
# Mr. WANG Minke
Title: Modelling and Solving the Location Inventory Problem with Stochastic Demand Considering Carbon Cap-and-Trade
Abstract: We address a multi-period facility location-inventory problem with the consideration of carbon emissions in a multi-echelon supply chain network consists of plants, potential DCs, and retailers. Given the hierarchical structure of the problem, a two-stage stochastic mathematical model is presented to integrate the inventory planning decisions, made under (t,s,S) inventory policy, with the location-allocation decisions to deal with nonstationary demand. Linear approximation technique and sample average approximation method are used to increasing the tractability of the stochastic programming. Due to the NP-hard of the problem, a three-step hierarchical metaheuristics algorithm is proposed to solve the model. Numerical experiments are conducted to validate the modelling and the three-step algorithm. Meanwhile, the impact of problem sizes, demand types and cost structures on the supply chain design solution and it’s cost breakdownare presented to give managerial insights.
# Mrs. ZHANG Jia
Title: High Dimensional Elliptical Sliced Inverse Regression in non-Gaussian Distributions
Abstract: Sliced inverse regression (SIR) is the most widely-used sufficient dimension reduction method due to its simplicity, generality and computational efficiency. However, when the distribution of the covariates deviates from the multivariate normal distribution, the estimation efficiency of SIR is rather low. In this paper, we propose a robust alternative to SIR - called elliptical sliced inverse regression (ESIR) for analyzing high dimensional, elliptically distributed data. There are wide applications of the elliptically distributed data, especially in finance and economics where the distribution of the data is often heavy-tailed. To tackle the heavy-tailed elliptically distributed covariates, we novelly utilize the multivariate Kendall’s tau matrix in a framework of so-called generalized eigenvector problem for sufficient dimension reduction. Methodologically, we present a practical algorithm for our method. Theoretically,we investigate the asymptotic behavior of the ESIR estimator under high dimensional setting. Quantities of simulation results show that ESIR significantly improves the estimation efficiency in heavy-tailed scenarios. Analysis of two real data sets also demonstrates the effectiveness of our method. Moreover, ESIR can be easily extended to most other sufficient dimension reduction methods and applied to non-elliptical heavy-tailed distributions.