Maja Vukovic, Jim LaredoABSTRACT
IBM T.J. Watson Research Center
Yorktown Heights, NY, 10598
IBM Research - Africa Lab
Local governments and enterprises are increasingly relying on social media to engage with citizens and employees, and respond to events (both natural and man-made disasters). In this paper we present results from a pilot deployment of CrisisTracker  in Kenya. We discuss the applicability of this system to monitoring and responding to different types of security events in the region. We describe how we addressed one of the key challenges – low volume of user-generated data in the region by extending the system to incorporate the on-line newspaper corpus. We conclude with our insights of what are the key challenges in adapting and making the existing technologies consumable by emerging economies.
With over 25% of the population in Kenya using Internet, and 15% using social networking sites , this presents an opportunity for novel applications of Social Web. With over 10 Million Kenyans using Internet, Kenya is being branded as “Silicon Savannah”.
In social media statistics Kenyans are second only to South Africans based on the quantity of Tweets that they generate . In 2012, Twitter was the go-to-channel for political activists to protest and peacefully express their position to lawmakers. 2013’s tragedy in the form of the Westgate Mall attack saw an increased use of Twitter during and after the incident, by victims, reporters and the disaster response organizations.
In this paper we present an application of CrisisTracker , a system that employs analytics and crowdsourcing technologies to derive actionable insights from Social Web data, to enable local governments and enterprises to drive disaster planning and crowd coordination during potentially threatening events in growth markets. We discuss results of our pilot deployment of CrisisTracker for security threat tracking. We describe how we have extended CrisisTracker, to make it more relevant in Kenya, by augmenting additional data sources. Finally, we discuss challenges that are faced by deploying novel technologies in emerging economies.
2 SOCIAL WEB FOR SECURITY MONITORING
Motivated by the then upcoming 2013 elections in Kenya, we put CrisisTracker  to test. CrisisTracker automatically tracks sets of keywords on Twitter, and generates stories by grouping highly-related tweets using a cosine similarity metric. An extended version of an algorithm based on locality sensitive hashing is applied for clustering . Grouped stories (clusters of similar Tweeter messages) are then classified and verified by crowd volunteers.
We explored CrisisTracker’s applicability to tracking security threats in the region. Users were corporate HR and Security professionals whose daily job included monitoring for security threats in order to manage the on-the-ground sales force and their wellbeing. One common use case was notification of traveling staff about nearby incidents.
3 USER STUDY
Our main objective in this work was to get the feedback from the system users on how to improve and adapt the system to this Kenyan region and user base.
A. Evaluation Setup and Results
The evaluation was done in two phases. First, in early November 2012 we deployed a system and worked with pilot users over 2 days. The second evaluation followed in early December 2012 when we engaged additional 20 users. Users were asked to tag, merge, split and remove stories from CrisisTracker. In both cases, the pilot was followed by a 30-min interview with each user (some users opted to send in their response by responding to a questionnaire).
Questionnaire consisted of 30 questions that covered the following aspects of the system:
Users commented on the high value features. One user said: 'I liked the hide option – it gives me the control over the system'. When it comes to the usability and availability of the system to be integrated with security processes in the corporation. All 6 users have identified the need for mobile access and ability to increase the reliability of story sources. Users reflected on the need for an 'e-mail notification option, if this is to be integrated into our workplace'. A common thread in making this system relevant at the disaster time is the requirement for formalized reports, which would distill actionable and meaningful data points from the captured stories. For example, number of casualties in a region, type and number of infrastructure elements that are damaged, etc.
B. Lessons Learnt
As the network connectivity varies in the region, the system needs to gracefully handle these changes and offer alternative means of data access (e.g. even including e-mail), depending on the user’s context. Furthermore, at the time of the deployment there was no direct way to submit reports to CrisisTracker. Some of the pilot users used their own Twitter handles to feed the system with the situational information.
On the technical front we faced the challenge of low volume of data, attributed to two reasons. Firstly the pilot deployment relied on Twitter’s free firehouse, which limits the number of Twitter messages we could track. Secondly the Twitter usage in Kenya region was on the lower end of spectrum. This became even more evident when we compared the volume of Twitter messages our system collected during the first pilot deployment of CrisisTracker in Syria, where the system daily processed 450K messages. Based on that pilot, team estimated that 15 volunteers for 30 minutes daily can the full meta-data during an event of such scale .
Motivated by the results of this pilot we embarked on addressing the challenges. Next section describes our approach in extending the system to work with online newspaper corpus in Kenya.
4 SYSTEM EXTENSIONS
A key source of information on current events in Kenya is the local newspapers. In addition to their printed version, most of the leading newspapers have integrated online copies complimented with twitter accounts. As a result, we developed the following extension to Crisis Tracker:
To augment the tweets with the newspaper stories, the extension to CrisisTracker downloads newspapers stories from the online sites. It then breaks down each story into sections, of words whose number of characters is roughly 140 characters – the size of a tweet. To maintain the distinction, these sections are referred to as pseudo-tweets. The pseudo-tweets are then passed through the same CrisisTracker algorithm for regular tweets to identify similarities. The similarities between pseudo-tweets within the same story are used to calculate how similar stories are to a tweet. This approach increased our corpus and made CrisisTracker more relevant to the Kenyan environment. We opted for the pseudo-Tweet approach to accelerate the deployment time of the new system. Future work will look into incorporating entire newspaper documents.
5 RELATED WORK
With the explosion of user generated data and social networking, numerous commercial and research systems that generate insights tailored for disaster management have sprawn [5,6,7,8]. Ushahidi  gained its popularity as a means of curating and geo-visualizing manually submitted reports from wide range of stories during elections in Kenya in 2007 and the resulting post-elections violence. Over time the advancement in automated data collection and processing resulted in a development of tools that made the detection of events in social media more efficient [6,7,8].
Whilst technical and deployment challenges still remain, in particular related to the availability and credibility of data, however, NGOs, law enforcement agencies and local governments are increasingly turning to social media as one data source in their decision making process throughout disaster management process .
The focus of our work is on understanding the consumability of novel technologies in an emerging market like Kenya (Africa), where user literacy, network connectivity, infrastructure, and the resources to manage these processes significantly vary. As we have shown, there is great promise in adapting existing technologies to make them relevant to emerging economies. Another future goal is to increase availability of such systems in local languages as this will be critical to adoption of such technologies across Africa.
 Miniwatts Marketing Group, internetworldstats.com, 2013.
 Kenya at 50: how social media has increased the pace of change. Available at: http://www.theguardian.com/global-development-professionals-network/2013/dec/13/kenya-social-media-mark-kaigwa
 J. Rogstadius, M. Vukovic, C.A.Teixeira, V. Kostakos, E. Karapanos, J.A. Laredo. CrisisTracker: Crowdsourced social media curation for disaster awareness. IBM Journal of Research and Development 57(5), 4--1, IBM, 2013
 M. Charikar, "Similarity estimation techniques from rounding algorithms," In Proc. 34th annual ACM symposium on Theory of computing, Montreal, 2002, pp. 380-388.
 Ushahidi, Online: http://ushahidi.org
 S. Kumar, G. Barbier, M. A. Abbasi, H. Liu. "TweetTracker: An Analysis Tool for Humanitarian and Disaster Relief," Demonstration Paper at 5th International AAAI Conference on Weblogs and Social Media, Barcelona, 2011.
 F. Abel, C. Hauff, G.J. Houben, R. Stronkman, K. Tao, "Semantics + Filtering + Search = Twitcident. Exploring Information in Social Web Streams," in Proc. International Conference on Hypertext and Social Media, Milwaukee, 2012, pp. 285-294.
 R. Jain, L. Jalali, S. Pongpaichet, A. Gupta: Building Social Life Networks. IEEE Data Engineering Bulletin. 36(3): 91-98 (2013)
 P. Meier. Human Computation for Disaster Response. In Handbook of Human Computation, ed. Pietro Michelucci et. al, Springer. 2014.