An Associate Professor in UCLA’s Communication Studies Department, Dr. Francis Steen is transforming the way we understand the media. He is currently immersed in several exciting research efforts including The NewsScape Project, which he describes as a next-generation interactive visualization of multimedia news.
In his teenage years, Steen became curious about the inner workings of the brain. When he came to UCLA in 2001, he developed an interest in how stories are told in the media and how different forms of visual communication affect people.
For several decades, scholars of political science and communication utilized text databases to conduct their studies. They based their research on the assumption that the language used in news broadcastings is the single-most important aspect of communication. Certain researchers today believe humans convert all that they see and experience into words. “Some think that everything we see gets translated into linguistic terms. For example, if you see a cat, you think the word cat,” said Steen.
Steen contests this belief. He argues that the human visual system independently comprehends and interprets every unique situation. “We understand directly visually without translating it into anything else. Much of what we see, we do not even have words for, nor an option to translate it into words. More importantly, the understanding is very fast. It’s in that sensory modality that we understand things directly,” said Steen.
When he arrived at UCLA, Steen sought to understand the impact of media in a way unexplored in past research – through images instead of words. To do so, he elicited the help of IDRE. He noted, “We wanted to be able to do a multimodal interpretation of the media. That posed a whole series of questions and problems that IDRE has been very helpful in helping us to solve.”
One challenge was finding a way to quickly search through and catalog images found in the media. Cutting-edge search devices allowed researchers to quickly sift through texts. But sophisticated tools to rapidly scour and index images were not readily available. Seeking to address this critical problem, Steen embarked upon The NewsScape Project, what is now an expansive collection of over 250,000 American and international news programs, which are all neatly indexed and time-stamped. Allowing users to quickly search through and stream news content dating back to 2005, the innovative platform opens up exciting possibilities for research, teaching and academia.
The NewsScape Project began in the mid-1970s when Steen’s colleague, Paul Rosenthal, began taping the Watergate hearings. Over the years, Rosenthal recorded hundreds of thousands of hours of news broadcasts using analog tapes. Although the archive was expansive, it was not greatly used because it was so challenging to navigate the contents of the tapes. Steen noted, “We were interested in doing research on the tapings, but it was extremely labor intensive to access what we needed.” The only way for researchers to analyze the tapes was to physically sit and watch the programs in their entirety and extract valuable information, an inevitably exhaustive and futile process.
To overcome this problem, Steen taught himself the Linux operating system and single-handedly began converting the video analog tapes to digital. Soon others in Steen’s department realized the effectiveness of his system and jumped on board. “Scott Waugh, the Dean of Social Sciences at the time, gave my team a seed grant to move over from the analog to the digital capture for maybe 50 to 60 programs a day,” said Steen. Steen and his colleagues quickly realized they could also capture the closed-captioning and use the time-stamped text files to create an index and build a search engine.
After successfully transitioning the system to a digital format, Steen next focused his efforts on collaborating with other faculty members and getting the necessary funding to further expand the project. After attending an IDRE event and discussing his project, Steen was told to speak with Song-Chun Zhu, a professor in the statistics and computer science departments at UCLA. “I spoke with Song-Chun Zhu and told him about a visualization idea that I called a virtual reality NewsScape. It would basically involve extracting lots of images at regular intervals and then classifying them and creating a landscape,” said Steen. Zhu believed the project could be a perfect fit for the National Science Foundation’s Cyber-enabled Discovery and Innovation Program.
Steen and Zhu worked laboriously on an application for the NSF-funded grant. “We worked on it for three years, and we were turned down twice. But the second time we received encouraging feedback, and we really believed in our project.” As they say, third time’s a charm. On attempt number three, Steen was awarded a four-year $1.8 million grant in the program.
Armed with funding, Steen and his colleagues were now able to start indexing the communicative patterns present in the digital recordings. “Now we were not only digital, but we were starting to penetrate into this previously dark matter of visual communication. We started opening it up and shedding light into it,” Steen excitedly remarked.
Steen’s project helped facilitate collaboration between the hard sciences and the humanities. “NewsScape is extremely interdisciplinary because the problems of multimodal and visual communication pose hard problems for both North Campus and South Campus,” said Steen. His project has challenged computer science experts to apply new perspectives to their research.“Everything they look at in computer vision is traditionally surveillance video, so there is no communicative dimension. The idea of images being used to communicate information was new and revolutionary to them,” Steen stated.
Over the next four years, Steen and his team concentrated on building a more sophisticated data-mining system. Steen first focused on implementing automated story detection, which allows researchers to track and compare two stories overtime. “We were interested in finding out how, for example, Russia covers an event in a certain way and how the U.S. covers that same event in another way and then characterizing those differences,” Steen said. The system is now equipped with automated story segmentation and commercial detection, allowing users to quickly browse through and compare different news programs. Currently, Steen is working on expanding the system’s capabilities by incorporating topic detection and clustering.
In late 2010 Steen decided to shift the post-processing of incoming files to IDRE’s Hoffman2 Shared Research Cluster. Steen praises Hoffman’s speed and efficiency, and he compliments the managerial staff for being so supportive. Noted Steen, “I worked closely with the system supervisors, and I found the staff was extremely helpful.” He has found the project to be well-supported in terms of resources, computing, storage and server-room space and has successfully run over 500 simultaneous jobs on the Hoffman2 Cluster.
In the winter of 2013, the UCLA library launched the UCLA Library Broadcast NewsScape, allowing students and faculty to access the broadcasts recorded by Steen. Offering almost immediate access to news coverage of recent and past events as well as all-inclusive capture and search capabilities, the project has the ability to revolutionize scholarship and transform our view of the world. “There have been several projects using the archives for different types of studies in fields like computer science, political science and information studies,” said Steen.
NewsScape has also garnered international recognition. Steen remarked, “Our project attracted an international group of researchers, that we call the Red Hen Lab, which is an international consortium of people doing research on multimodal communication using the NewsScape archive as its main source of data.”
Nicknamed for the title character in the fairytale of the little red hen, “the spirit of the Red Hen Lab is that everyone contributes and does something that enhances the collection, either by providing meta-data or recording material,” Steen explained. The Red Hen Lab has brought together researchers from famed universities across the globe, resulting in an expansive collection of digital recordings. “We emphasize that news media today is a global phenomenon. We have material from Denmark, Sweden, Norway and Spain, amongst others. It’s increasingly an international collection,” Steen said.
Now in the last year of his four-year grant from the NSF, Steen is focused on two things: expanding his research and growing internationally. He is currently working with the Information Studies Department to detect sentiments in texts and create interactive visualizations. He is also planning on applying for a NSF big data grant in the summer and is looking forward to continuing the collaboration between communication studies and computer science.
Additionally, Steen seeks to understand the differences in how social media and elite news media talk about particular events and issues. Working with Todd Presner in Digital Humanities, Steen is comparing how stories are covered in Twitter and Reddit to how stories are portrayed on American and international news networks. “We are especially interested in causal reasoning, which is understanding how people assign explanations to what is happening. We argue that the explanations that get assigned to stories determine how society is then disposed to act and respond to events. This issue is critical, and it differs across networks. It differs internationally, and it may differ significantly with democratic and elite media,” Steen remarked.
Steen is enthusiastic about what the future holds for his field of interest. He enthusiastically stated, “UCLA is in a unique position to take on these big data projects that bring together expertise from multiple different disciplines and multiple different data for this new multimodal understanding of data flow.” Projects like NewsScape bring together a wide range of fields. While NewsScape involves data mining and high performance computing, it addresses questions posed by cognitive scientists and psychologists. The project seeks to understand how humans make sense of certain images and how the media uses persuasion and establishes blended joint attention with viewers. “It is interesting looking at how cognitive science and psychological topics have an impact and can in turn set the agenda for what kind of data mining to do with the people involved in computer vision,” Steen said.
In addition to expanding his research, Steen hopes to grow internationally. “We’re interested in setting up an international consortium of different data sets. Different groups will have their own collections and will coordinate at the level of meta-data and access. This consortium will give members the right to go in and access these different collections. There is a certain momentum behind this because the techniques for investigating types of massive data collections are so far advanced that you can begin to ask some really interesting questions and get some interesting answers.”
To learn more about the NewsScape project, visit the NewsScape Project website.