Consumability of Social Web Insights in Emerging Economies

Maja Vukovic, Jim Laredo
IBM T.J. Watson Research Center
Yorktown Heights, NY, 10598
{maja, laredoj}

Osamuyimen Stewart            
IBM Research - Africa Lab    
Nairobi, Kenya            

Anthony Mwangi
Strathmore University   
Nairobi, Kenya

Local governments and enterprises are increasingly relying on social media to engage with citizens and employees, and respond to events (both natural and man-made disasters). In this paper we present results from a pilot deployment of CrisisTracker [3] in Kenya. We discuss the applicability of this system to monitoring and responding to different types of security events in the region. We describe how we addressed one of the key challenges – low volume of user-generated data in the region by extending the system to incorporate the on-line newspaper corpus. We conclude with our insights of what are the key challenges in adapting and making the existing technologies consumable by emerging economies.

With over 25% of the population in Kenya using Internet, and 15% using social networking sites [1], this presents an opportunity for novel applications of Social Web. With over 10 Million Kenyans using Internet, Kenya is being branded as “Silicon Savannah”.

In social media statistics Kenyans are second only to South Africans based on the quantity of Tweets that they generate [2].  In 2012, Twitter was the go-to-channel for political activists to protest and peacefully express their position to lawmakers. 2013’s tragedy in the form of the Westgate Mall attack saw an increased use of Twitter during and after the incident, by victims, reporters and the disaster response organizations.

In this paper we present an application of CrisisTracker [3], a system that employs analytics and crowdsourcing technologies to derive actionable insights from Social Web data, to enable local governments and enterprises to drive disaster planning and crowd coordination during potentially threatening events in growth markets. We discuss results of our pilot deployment of CrisisTracker for security threat tracking. We describe how we have extended CrisisTracker, to make it more relevant in Kenya, by augmenting additional data sources. Finally, we discuss challenges that are faced by deploying novel technologies in emerging economies.

Motivated by the then upcoming 2013 elections in Kenya, we put CrisisTracker [3] to test. CrisisTracker automatically tracks sets of keywords on Twitter, and generates stories by grouping highly-related tweets using a cosine similarity metric. An extended version of an algorithm based on locality sensitive hashing is applied for clustering [4]. Grouped stories (clusters of similar Tweeter messages) are then classified and verified by crowd volunteers.

Fig. 1:
CrisisTracker deployment in Kenya

We explored CrisisTracker’s applicability to tracking security threats in the region. Users were corporate HR and Security professionals whose daily job included monitoring for security threats in order to manage the on-the-ground sales force and their wellbeing. One common use case was notification of traveling staff about nearby incidents.

Our main objective in this work was to get the feedback from the system users on how to improve and adapt the system to this Kenyan region and user base.

A. Evaluation Setup and Results

The evaluation was done in two phases. First, in early November 2012 we deployed a system and worked with pilot users over 2 days. The second evaluation followed in early December 2012 when we engaged additional 20 users. Users were asked to tag, merge, split and remove stories from CrisisTracker. In both cases, the pilot was followed by a 30-min interview with each user (some users opted to send in their response by responding to a questionnaire).

Questionnaire consisted of 30 questions that covered the following aspects of the system:
  1. Record of user’s system usage: in this section we wanted to validated how long the user used the system, and what were the main actions they performed and which features they used. For example, did user click on individual stories to read them, have they followed any other information sources?
  2. Evaluation of user interface: our intention was to understand whether the user felt that right stories were suggested by the system as important. We also wanted to find out whether any tasks on the UI were tedious or inefficient.
  3. Evaluation of the concept: the purposed of these questions was to uncover any story types that were difficult to work with, and what would be their motivation for using the system.
  4. User’s perspective: we included several questions targeting the user’s perception of the tool and applicable usage domains. We proposed a set of possible extensions to the system and asked user to indicate their importance.
  5. Reflection on the recent events where the system can be used. We asked the user to think back to a recent crisis event, and relate how CrisisTracker may or may not have helped.
During the first pilot that ran over the course of two days, the system collected 132,738 stories based on 363 keywords (hashtags, usernames and locations tracked). 13 stories were tagged, 30 were deleted and/or merged by 6 system users. Table 1 summarizes key findings from interviews.

Table 1: Questionnaire insights

Users commented on the high value features. One user said: 'I liked the hide option – it gives me the control over the system'.  When it comes to the usability and availability of the system to be integrated with security processes in the corporation. All 6 users have identified the need for mobile access and ability to increase the reliability of story sources. Users reflected on the need for an 'e-mail notification option, if this is to be integrated into our workplace'. A common thread in making this system relevant at the disaster time is the requirement for formalized reports, which would distill actionable and meaningful data points from the captured stories. For example, number of casualties in a region, type and number of infrastructure elements that are damaged, etc.

B. Lessons Learnt

As the network connectivity varies in the region, the system needs to gracefully handle these changes and offer alternative means of data access (e.g. even including e-mail), depending on the user’s context. Furthermore, at the time of the deployment there was no direct way to submit reports to CrisisTracker. Some of the pilot users used their own Twitter handles to feed the system with the situational information.

On the technical front we faced the challenge of low volume of data, attributed to two reasons. Firstly the pilot deployment relied on Twitter’s free firehouse, which limits the number of Twitter messages we could track. Secondly the Twitter usage in Kenya region was on the lower end of spectrum. This became even more evident when we compared the volume of Twitter messages our system collected during the first pilot deployment of CrisisTracker in Syria[3], where the system daily processed 450K messages. Based on that pilot, team estimated that 15 volunteers for 30 minutes daily can the full meta-data during an event of such scale [3].

Motivated by the results of this pilot we embarked on addressing the challenges. Next section describes our approach in extending the system to work with online newspaper corpus in Kenya.

A key source of information on current events in Kenya is the local newspapers. In addition to their printed version, most of the leading newspapers have integrated online copies complimented with twitter accounts.  As a result, we developed the following extension to Crisis Tracker:

Fig. 2: System extension of CrisisTracker in Kenya

To augment the tweets with the newspaper stories, the extension to CrisisTracker downloads newspapers stories from the online sites. It then breaks down each story into sections, of words whose number of characters is roughly 140 characters – the size of a tweet. To maintain the distinction, these sections are referred to as pseudo-tweets. The pseudo-tweets are then passed through the same CrisisTracker algorithm for regular tweets to identify similarities. The similarities between pseudo-tweets within the same story are used to calculate how similar stories are to a tweet. This approach increased our corpus and made CrisisTracker more relevant to the Kenyan environment. We opted for the pseudo-Tweet approach to accelerate the deployment time of the new system. Future work will look into incorporating entire newspaper documents.

With the explosion of user generated data and social networking, numerous commercial and research systems that generate insights tailored for disaster management have sprawn [5,6,7,8]. Ushahidi [5] gained its popularity as a means of curating and geo-visualizing manually submitted reports from wide range of stories during elections in Kenya in 2007 and the resulting post-elections violence. Over time the advancement in automated data collection and processing resulted in a development of tools that made the detection of events in social media more efficient [6,7,8].

Whilst technical and deployment challenges still remain, in particular related to the availability and credibility of data, however, NGOs, law enforcement agencies and local governments are increasingly turning to social media as one data source in their decision making process throughout disaster management process [8].

The focus of our work is on understanding the consumability of novel technologies in an emerging market like Kenya (Africa), where user literacy, network connectivity, infrastructure, and the resources to manage these processes significantly vary. As we have shown, there is great promise in adapting existing technologies to make them relevant to emerging economies. Another future goal is to increase availability of such systems in local languages as this will be critical to adoption of such technologies across Africa.


[1] Miniwatts Marketing Group,, 2013.

[2] Kenya at 50: how social media has increased the pace of change. Available at:

[3] J. Rogstadius, M. Vukovic, C.A.Teixeira, V. Kostakos, E. Karapanos, J.A. Laredo. CrisisTracker: Crowdsourced social media curation for disaster awareness. IBM Journal of Research and Development 57(5), 4--1, IBM, 2013

[4] M. Charikar, "Similarity estimation techniques from rounding algorithms," In Proc. 34th annual ACM symposium on Theory of computing, Montreal, 2002, pp. 380-388.

[5] Ushahidi, Online:

[6] S. Kumar, G. Barbier, M. A. Abbasi, H. Liu. "TweetTracker: An Analysis Tool for Humanitarian and Disaster Relief," Demonstration Paper at 5th International AAAI Conference on Weblogs and Social Media, Barcelona, 2011.

[7] F. Abel, C. Hauff, G.J. Houben, R. Stronkman, K. Tao, "Semantics + Filtering + Search = Twitcident. Exploring Information in Social Web Streams," in Proc. International Conference on Hypertext and Social Media, Milwaukee, 2012, pp. 285-294.

[8] R. Jain, L. Jalali, S. Pongpaichet, A. Gupta: Building Social Life Networks. IEEE Data Engineering Bulletin. 36(3): 91-98 (2013)

[9] P. Meier. Human Computation for Disaster Response. In Handbook of Human Computation, ed. Pietro Michelucci et. al, Springer. 2014.

Dr. Maja Vukovic is a Research Staff Member and a Master Inventor at the Thomas J. Watson Research Center.  At IBM, Maja is working on enterprise crowdsourcing and API ecosystem innovation. Maja has received an IBM Outstanding Technical Achievement Award for her technical contributions to the field of enterprise crowdsourcing. She has authored over 50 technical papers and filed over 40 patent applications. Maja has organized several workshops on crowdsourcing and social media for disaster management.  She received Ph.D. degree in computer science from the University of Cambridge in 2006.  Previously she worked as a Research Scientist at MercedesBenz Research Lab in Palo Alto on telematics systems. Dr. Vukovic is a senior member of IEEE (Institute of Electrical and Electronics Engineers).

Jim A. Laredo has over 20 years experience in the Information Technology Industry, including the areas of Transaction Processing (Transarc Corporation), Business Process Management (Vitria Technology), Software as a Service, Cloud Computing and Systems Management (IBM). He has been building systems and services for customers in Financial Services, Telecommunications, Insurance Services and Manufacturing. Jim has spent the last 10 years at IBM, 8 of which at the T.J Watson Research Center in NY, and the last 3 years in particular working in the area of Enterprise Crowdsourcing, looking for innovative techniques that leverage people's knowledge to scale business process management across the organizations of the enterprise. Mr. Laredo has an Engineering degree from Universidad Simon Bolivar, Caracas Venezuela, and a M.Sc. in Computer Science from University of Toronto.

Osamuyimen (Uyi) Stewart is the Chief Scientist at IBM Research - Africa, located in Nairobi, Kenya. He obtained a Master of Philosophy degree in Linguistics from Cambridge University in 1991 and earned a doctorate degree from McGill University in 1998. He previously taught at the University of British Columbia, and has also worked at Nuance, Call Sciences, and AT&T Labs, leading in the design, research, and deployment of advanced human-computer interaction systems.  His research are predicated on the systematic application of the principles, rigor, and methodologies from the study of human language and behavior for advancing the science of people in services science and software development. He has authored 40+ publications in top journals and conferences, issued 6 patents, and recently received Black Engineer of the Year Awards, 2014, USA, for Outstanding Technical Contributions - Industry.

Anthony Mwangi is an assistant lecturer at Strathmore University, Kenya and a Research Associate at @iLabAfrica, Kenya. He holds a MSc. Degree in Information Technology from the same university where his thesis was titled “A Crowdsourcing Model for Continual Collaboration between Companies and their Consumers”. His research interests lie in data analysis to provide Business Intelligence to Small and Medium-sized Enterprises.