Mining Credible and Relevant News from Social Networks
Published in The Fifth International Conference on Big Data Analytics (BDA 2017) (Full Paper Acceptance Rate: 12%), 2017
Recommended citation: Ankur Garg, Varun Syal, Pankaj Gudlani, and Dhaval Patel. 2017. Mining Credible and Relevant News from Social Networks. In the Fifth International Conference on Big Data Analytics (BDA 2017), Hyderabad, India, December 12-15, 2017, 14 pages.
Today, people are increasingly accessing news through social networks like Twitter. This is regardless of the fact that whether the news is regarding a parliamentary election, or a famous entertainment celebrity. Moreover, these platforms allow people to like, retweet and comment on the shared news article. This shapes the opinions and beliefs of the people who read it along with the news article itself. However, a major problem we face today is the misuse of these networks for spreading rumors and misleading news content. This is the practice of yellow journalism which aims at disrupting public sentiment. To address this problem, we present a methodology to find credible and relevant tweets that refer to actual news articles published on news websites. Our methodology scores each tweet based on the reputation of the users sharing it, the news publisher which published the news article, and the popularity of the news concepts mentioned in the article. We model the interaction between these three entities in the form of a tripartite graph and propose a Co-HITS algorithm based formulation to score all the entities involved. The scores of individual entities is used to assign a score for each tweet that indicates the credibility and relevance of the news mentioned in it. We find that the presence of many bots is also a big problem in these networks and can affect the results of such explorations. Thus, we use existing bot detection techniques to identify bots and propose an approach to limit their influence on the system in an efficient manner. Finally, we present a qualitative evaluation of our proposed system on a set of approximately 8000 tweets.