Read by QxMD icon Read

Big Data

Kristina Gavin Bigsby, Jeffrey W Ohlmann, Kang Zhao
Social media provides a platform for individuals to craft personal brands and influence their perception by others, including potential employers. Yet there remains a need for more research investigating the relationship between individuals' online identities and offline outcomes. This study focuses on the context of college football recruiting, specifically on the relationship between recruits' Twitter activities and coaches' scholarship offer decisions. Based on impression management theory, we analyze content posted by recruits and apply machine learning to identify instances of self-promotion and ingratiation in 5...
March 13, 2019: Big Data
Iman Behravan, Seyed Hamid Zahiri, Seyed Mohammad Razavi, Roberto Trasarti
Recently, professional team sport organizations have invested their resources to analyze their own and opponents' performance. So, developing methods and algorithms for analyzing team sports has become one of the most popular topics among data scientists. Analyzing football is hard because of its complexity, number of events in each match, and constant flow of circulation of the ball. Finding roles of players with the purpose of analyzing the performance of a team or making a meaningful comparison between players is crucial...
February 15, 2019: Big Data
Uwe Dick, Ulf Brefeld
We investigate how to learn functions that rate game situations on a soccer pitch according to their potential to lead to successful attacks. We follow a purely data-driven approach using techniques from deep reinforcement learning to valuate multiplayer positionings based on positional data. Empirically, the predicted scores highly correlate with dangerousness of actual situations and show that rating of player positioning without expert knowledge is possible.
January 23, 2019: Big Data
Gavin A Whitaker, Ricardo Silva, Daniel Edwards
We consider the task of determining the number of chances a soccer team creates, along with the composite nature of each chance-the players involved and the locations on the pitch of the assist and the chance. We infer this information using data consisting solely of attacking events, which the authors believe to be the first approach of its kind. We propose an interpretable Bayesian inference approach and implement a Poisson model to capture chance occurrences, from which we infer team abilities. We then use a Gaussian mixture model to capture the areas on the pitch a player makes an assist/takes a chance...
December 1, 2018: Big Data
Andrew Urbaczewski, Ryan Elmore
Fantasy sports are a popular way for individuals to add another layer of enjoyment to their interest in sports. While fantasy sports have been around for many years, access to big data sets and computer power to process them is a relatively new phenomenon, as well as the ability to compete in daily competitions and not just season-long campaigns. We posit that access to new and yet unforeseen data, models, and computing power to manage it, when viewed through the lens of efficient market hypothesis, will cause the daily fantasy sports market to change dramatically...
November 16, 2018: Big Data
Gary Smith, Andrew Capron
Football scores are an imperfect measure of a team's ability, and consequently exaggerate differences in abilities. Those teams that perform the best and the worst are not really so far from average in their ability; thus their future performances regress to the mean. Betting data indicate that gamblers do not fully account for this regression.
November 14, 2018: Big Data
Arie-Willem de Leeuw, Laurentius A Meerhoff, Arno Knobbe
This article focuses on the performance of runners in official races. Based on extensive public data from participants of races organized by the Boston Athletic Association, we demonstrate how different pacing profiles can affect the performance in a race. An athlete's pacing profile refers to the running speed at various stages of the race. We aim to provide practical, data-driven advice for professional as well as recreational runners. Our data collection covers 3 years of data made public by the race organizers, and primarily concerns the times at various intermediate points, giving an indication of the speed profile of the individual runner...
November 13, 2018: Big Data
Floris R Goes, Matthias Kempe, Laurentius A Meerhoff, Koen A P M Lemmink
In professional soccer, nowadays almost every team employs tracking technology to monitor performance during trainings and matches. Over the recent years, there has been a rapid increase in both the quality and quantity of data collected in soccer resulting in large amounts of data collected by teams every single day. The sheer amount of available data provides opportunities as well as challenges to both science and practice. Traditional experimental and statistical methods used in sport science do not seem fully capable to exploit the possibilities of the large amounts of data in modern soccer...
September 21, 2018: Big Data
Farrokh Alemi, Sanja Avramovic, Mark D Schwartz
Existing methods of screening for substance abuse (standardized questionnaires or clinician's simply asking) have proven difficult to initiate and maintain in primary care settings. This article reports on how predictive modeling can be used to screen for substance abuse using extant data in electronic health records (EHRs). We relied on data available through Veterans Affairs Informatics and Computing Infrastructure (VINCI) for the years 2006 through 2016. We focused on 4,681,809 veterans who had at least two primary care visits; 829,827 of whom had a hospitalization...
September 1, 2018: Big Data
Maxime C Cohen, C Daniel Guetta, Kevin Jiao, Foster Provost
We develop a number of data-driven investment strategies that demonstrate how machine learning and data analytics can be used to guide investments in peer-to-peer loans. We detail the process starting with the acquisition of (real) data from a peer-to-peer lending platform all the way to the development and evaluation of investment strategies based on a variety of approaches. We focus heavily on how to apply and evaluate the data science methods, and resulting strategies, in a real-world business setting. The material presented in this article can be used by instructors who teach data science courses, at the undergraduate or graduate levels...
September 1, 2018: Big Data
David J Hand
Ready data availability, cheap storage capacity, and powerful tools for extracting information from data have the potential to significantly enhance the human condition. However, as with all advanced technologies, this comes with the potential for misuse. Ethical oversight and constraints are needed to ensure that an appropriate balance is reached. Ethical issues involving data may be more challenging than the ethical challenges of some other advanced technologies partly because data and data science are ubiquitous, having the potential to impact all aspects of life, and partly because of their intrinsic complexity...
September 1, 2018: Big Data
Emilio Carrizosa, Vanesa Guerrero, Daniel Hardt, Dolores Romero Morales
In this article we develop a novel online framework to visualize news data over a time horizon. First, we perform a Natural Language Processing analysis, wherein the words are extracted, and their attributes, namely the importance and the relatedness, are calculated. Second, we present a Mathematical Optimization model for the visualization problem and a numerical optimization approach. The model represents the words using circles, the time-varying area of which displays the importance of the words in each time period...
June 2018: Big Data
Pablo Basanta-Val, Luis Sánchez-Fernández
The proliferation of new data sources, stemmed from the adoption of open-data schemes, in combination with an increasing computing capacity causes the inception of new type of analytics that process Internet of things with low-cost engines to speed up data processing using parallel computing. In this context, the article presents an initiative, called BIG-Boletín Oficial del Estado (BOE), designed to process the Spanish official government gazette (BOE) with state-of-the-art processing engines, to reduce computation time and to offer additional speed up for big data analysts...
June 2018: Big Data
Wilfried Lemahieu, Seppe Vanden Broucke, Bart Baesens
No abstract text is available yet for this article.
June 2018: Big Data
Yadigar Imamverdiyev, Fargana Abdullayeva
In this article, the application of the deep learning method based on Gaussian-Bernoulli type restricted Boltzmann machine (RBM) to the detection of denial of service (DoS) attacks is considered. To increase the DoS attack detection accuracy, seven additional layers are added between the visible and the hidden layers of the RBM. Accurate results in DoS attack detection are obtained by optimization of the hyperparameters of the proposed deep RBM model. The form of the RBM that allows application of the continuous data is used...
June 2018: Big Data
Andrej Duh, Marjan Slak Rupnik, Dean Korošak
Computational propaganda deploys social or political bots to try to shape, steer, and manipulate online public discussions and influence decisions. Collective behavior of populations of social bots has not been yet widely studied, although understanding of collective patterns arising from interactions between bots would aid social bot detection. In this study, we show that there are significant differences in collective behavior between population of bots and population of humans as detected from their Twitter activity...
June 2018: Big Data
Vaibhav Pandey, Poonam Saini
MapReduce (MR) computing paradigm and its open source implementation Hadoop have become a de facto standard to process big data in a distributed environment. Initially, the Hadoop system was homogeneous in three significant aspects, namely, user, workload, and cluster (hardware). However, with growing variety of MR jobs and inclusion of different configurations of nodes in the existing cluster, heterogeneity has become an essential part of Hadoop systems. The heterogeneity factors adversely affect the performance of a Hadoop scheduler and limit the overall throughput of the system...
June 2018: Big Data
Zoran Obradovic
No abstract text is available yet for this article.
June 2018: Big Data
Varol Onur Kayhan, Alison Watkins
This article proposes a novel approach, called data snapshots, to generate real-time probabilities of winning for National Basketball Association (NBA) teams while games are being played. The approach takes a snapshot from a live game, identifies historical games that have the same snapshot, and uses the outcomes of these games to calculate the winning probabilities of the teams in this game as the game is underway. Using data obtained from 20 seasons worth of NBA games, we build three models and compare their accuracies to a baseline accuracy...
June 2018: Big Data
Saurabh Nagrecha, Reid A Johnson, Nitesh V Chawla
Nonstandard insurers suffer from a peculiar variant of fraud wherein an overwhelming majority of claims have the semblance of fraud. We show that state-of-the-art fraud detection performs poorly when deployed at underwriting. Our proposed framework "FraudBuster" represents a new paradigm in predicting segments of fraud at underwriting in an interpretable and regulation compliant manner. We show that the most actionable and generalizable profile of fraud is represented by market segments with high confidence of fraud and high loss ratio...
March 2018: Big Data
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"