
Siyi Huang knows data science like she knows the back of her own hand. As a DSFederal Data Scientist , she approaches her language processing role on the National Agricultural Library’s (NAL) Agricultural Research Services (ARS) project with ease and precision, using Python software as a communication tool and a means by which to group hundreds of projects using specific keywords. This high-level grouping that she oversees allows ARS project leaders to easily view projects before deciding how to use the information.
She is brilliant.
Here’s how she breaks down the process of data analytics: “First you collect the data, clean it and analyze it to meet a client’s requirements. Then comes the visualization.”
This bird’s eye view leads to immense client satisfaction. She said, “ARS has a lot of agricultural-related projects. We utilize the power of data science to provide resources to those projects. We have a lot of data from which the client needs project-specific information. Our job is to help them extract that data.”
Siyi’s interest in data began when she was in middle school, when she would use Excel spreadsheets to help her mom interpret business reports for her factory.
As an undergraduate at the University of Michigan, she was initially an Economics major, but decided the field was too theoretical. “I found that I could learn more from the data itself,” she said, going on to add Statistics as a second major. It was also at this time that she began her odyssey into the field of data science: conducting epidemics projects, including analysis and construction of a prediction model for the age at which women marry. After graduating, she went on to pursue a master’s degree in Statistics from George Washington University.
Since joining NAL, this data genius has brought her expertise to two projects. In the first, a web traffic analysis project, the client wanted to see the number of downloads, page views, and the location and number of visitors for 25 websites. Siyi developed an automatic data pipeline that imported web traffic data and conducted data cleaning and data mining. She used Tableau software to create maps, box plots, histograms, heat maps and tree maps.
Currently, she uses language processing to help executive leaders better manage data from thousands of agricultural-related projects. Her team gathers project plans then converts them into a machine-readable format. It’s a one-click product where the machine quickly groups the projects-- simplifying huge amounts of data and promoting collaboration and scientific discovery.
“Data can provide you with useful information, help you predict the future and minimize risk,” said Siyi, when asked about the imminent and future role of data. “Big data is the future,” she adds. “DSFederal is getting many new projects related to data science because we understand the power of data.”
Siyi’s ground-breaking work in data science reinforces DSFederal’s position as a leader in data science services. Thank you Siyi for your insight, talent and innovation!