Throughout its lifetime at the forefront of developing acoustic trawl surveys, the focus of ICES Working Group on Fisheries Acoustics, Science, and Technology (WGFAST) has shifted from solving the technical challenges of reliable acoustic measurements to the use of acoustics to observe key biological parameters such as biomass indices, behavioural metrics, and acoustic species classification.
Big data is one of the next steps in the evolution of fisheries acoustics. These data provide unprecedented observations of the aquatic environment but with this abundance of data comes the costs of storage, access and discoverability, processing and analysis, and interpretation.
Along with Mike Jech (NOAA Northeast Fisheries Science Center), the Chair of WGFAST, we spoke to group members Laurent Berger (IFREMER), Nils Olav Handegard (Norwegian Institute of Marine Research), Wu-Jung Lee (Applied Physics Laboratory, University of Washington), Gavin Macaulay (Norwegian Institute of Marine Research), Hjalte Parner (ICES) and Carrie Wall (University of Colorado/NOAA National Centres for Environmental Information) on the importance of big data in fisheries acoustics at present and how they see the role of WGFAST in big data analysis and interpretation in the future.
What is the significance of big data in fisheries acoustics?
Gavin Macaulay: Fisheries acoustic data (acoustic surveys primarily) is mainly collected for quite narrow specific objectives – for example, to estimate biomass of a single species in a single area. However, it can have a much broader use if viewed from a higher level, with many datasets combined over large spatial and temporal scales. This is where the techniques developed for big data can get more out of existing acoustic datasets.
Carrie Wall: Big data is fisheries acoustics. The increased capability of sonars used for fisheries acoustic research (e.g. broadband) and the expanding number of platforms in which these sensors are mounted to (e.g. autonomous vehicles) has and will continue to result in an exponential growth in data that scientists are able to collect. Through big data analytics, new insights can be made about ocean ecosystems.
Nils Olav Handegard: Traditionally, we have processed fisheries acoustic data manually on a survey–by-survey basis, typically for single-species assessment purposes. What I find interesting is what other signals can be found in the data across surveys, areas, and times of the year.
How does big data help us better understand the marine environment in comparison to past methods and analyses?
Mike Jech: What started out as monitoring single species with single frequency echosounders has blossomed into collecting data with multiple types of acoustic instrumentation deployed on a variety of platforms over time-scales of seconds to months and spatial scales of centimetres to kilometres. This explosion of data provides unprecedented views of the aquatic environment, and will require us to transition from visually scrutinizing narrow bits of time and space to interpreting large features and patterns, and changes in those patterns, that can only be done using automated algorithms.
Laurent Berger: In terms of spatial and temporal coverage, there is no other observation method equivalent to acoustic. Sonars allow us to look at the ecosystem as a whole - from the surface to the seafloor - within the same dataset. Backscatter data from various organisms varies depending on behaviour and physiology, so we then need more data at different frequencies and angles of insonification to unambiguously identify targets. These data are already regularly collected but need to be explored and compared with ground-truthed data using nets and video.
Carrie Wall: Past methods and analyses focused on data collected during a limited time period and predominately manual analysis methods. While tried and true, this approach is not inherently scalable to large volumes of data nor adaptable to changing environments. By being able to efficiently analyse more data collected in one area or data collected across a larger area more frequently, we will be able to determine changes in acoustic patterns more readily and compare them to other environmental data to provide a holistic understanding of the marine environment.
What is meant by “enhanced data discovery and access" and how does this relate to the work of WGFAST?
Gavin Macaulay: The collection of acoustic data is carried out by many groups. They generally store their data locally (although some countries now have central stores) and this creates barriers to effective discovery and use of the data beyond the original collection purpose. This barrier is reduced by enhanced data discovery and access protocols and standards. Imagine the internet without a search engine – that is almost (but not quite) the state of fisheries acoustic data.
Carrie Wall: Two main factors drive enhanced data discovery and access: the first is a central repository hosting large volumes of well-documented fisheries acoustic data and the second is a standards-based metadata that feeds the data discovery filtering. WGFAST's work to implement a metadata convention for processed acoustic data from active acoustic systems and the SONAR-netCDF4 convention ensure that metadata are standardized across the community and that file formats are more accessible – these are two examples of our contribution to enhanced data discovery and access.
Mike Jech: The data needed to address “big" questions - such as the effects of a changing climate - are often collected by multiple institutions using a variety of instrumentation and are often archived within those institutions and not available to the broader community. These data need to be accessible using efficient search engines that can be used by all members of the scientific community. The members of WGFAST comprise many of these institutions and we are working on all stages of addressing these questions. The role of WGFAST is to provide an open and transparent forum where ideas and methodologies can be discussed - and to foster collaborations that will move the community forward in a coherent way.
Why is it important to apply artificial intelligence and machine learning methods to big data?
Wu-Jung Lee: We are now in the era of Big Acoustic Data —technological developments over the past two decades have made it much easier for us to collect large quantities of fisheries acoustic data, from a wide variety of ocean observing platforms, including ships, moorings, and autonomous surface and underwater vehicles. However, our data analysis capability has not grown proportionally: a lot of existing acoustic data analysis pipelines rely on human experts to extract biological information manually, which leads to the scalability problem in analysing massive volume of data. This mismatch is what I called the "acoustic data deluge". Reproducibility is another critical issue with manual analysis, since each individual expert's analysis is strongly influenced by prior experience and their perception of the data and the environmental context. Many AI/ML methods are designed to handle large volumes of data efficiently. The models and algorithms are parameterized and therefore tractable even when the parameters need to be optimized for each specific application. These features make them excellent candidates for addressing the scalability and reproducibility challenges brought by the big data.
Why is it important to have open-source software and data?
Wu-Jung Lee: In my view, open-source software and data provide two key elements to address the big data challenge: accessibility and transparency, both of which are critical in accelerating method developments and discovery. I believe that open-source software and data will help us move forward faster as a community. I created echopype to parse, combine, and analyse heterogeneous acoustic data collected from multiple types of echosounders without excessive *wrangling* of data, which scientists very often spend a lot of time on.
An important feature of open-source software is that it is possible to look "under the hood" of any operation you choose to perform on the data. This is crucial in promoting true understanding of the operations -- as there is often a leap between mathematical equations and software implementations -- and in discovering and fixing bugs that would otherwise lay unnoticed but continue to impact analysis results.
My hope is that by lowering the barrier to reading and performing computations on echosounder data, echopype will help broaden the use of active acoustic methods as a general tool for studying life in the ocean.
Gavin Macaulay: Open data facilitates better use of the data. It reduces duplication of data collection and allows for more innovative use of data. Open-source software hugely reduces the barriers to data analysis, especially for less wealthy countries and those outside of large institutions. Well-developed and tested software is preferred rather than the development of often “buggy" software by individuals. It also allows people to more easily stand on the shoulders of existing giants.
What challenges do you see in handling big data and how does WGFAST come in the picture?
Hjalte Parner: One challenge is how to bring all this data together into one big pool accessible to everybody. Having one hub would require quite a bit of permanent funding which may not be possible. However, building a framework for a distributed system that required the participation of the entire acoustic community could be a solution – WGFAST seems to be a good start for distributing such a network.
Carrie Wall: Identifying scalable, long-term, and financially achievable storage and access points are a big challenge for big data. Being able to bring the processing to the data rather than the data to local processing machines is a pathway that needs to be facilitated for researchers to enable their big data processing. WGFAST can continue to provide expert knowledge and guidance in the standardization of metadata and file format/content.
Michael Jech: Access to data is a big challenge now and efforts are underway to deal with discovery and access. However, acoustic data alone will not answer many questions. One of the next challenges is merging different data streams, such as acoustic, environmental, and biological data to address “big" questions. Big data is the next step in the evolution of WGFAST, where it will continue to be an authoritative voice in developing and evaluating new technologies, and will expand its role to promote the use of these data for ecosystem research, monitoring, and management
What role does WGFAST play in big data fisheries acoustics now and how do you see this role in the future?
Carrie Wall: WGFAST comprises the world's experts in fisheries acoustic science. They will lead the way in building machine learning and artificial intelligence methods to advance this field. WGFAST can serve to collect, collate, and share these methods and ensure open-source software and models are as interoperable as possible. By connecting this international community, WGFAST can foster collaboration to gather requirements and identify effective analytical methods for big data fisheries acoustics.
Laurent Berger: WGFAST has the expertise for validating the outcome of “big data" in fishery science with peer review on ongoing developments and by being a forum for exchanging ideas.
Mike Jech: Currently, WGFAST is a forum for presentation and discussion of ideas and methodologies. The annual meeting is a time when most members of the field come together to learn about what others are doing and to form collaborations. I see WGFAST taking a leadership role in facilitating dissemination of information throughout the year, for example dynamic websites that can keep pace with software developments; facilitating collaborations to foster open-source software development that will improve data access, discovery, and analysis; and providing some level of authority to help ensure correct interpretation of acoustic data.
Read more about the work of WGFAST and access their latest Scientific Report.
The work of WGFAST addresses Observation and exploration and Emerging techniques and technologies, two of ICES scientific priorities. Discover all seven interrelated scientific priorities and how our network will address them in our Science Plan: “Marine ecosystem and sustainability science for the 2020s and beyond".