Intro to gumbopp
Gumbopp is a simple library that wraps Google’s gumbo HTML 5 parser, originally written in C, with a modern C++ interface that should integrate well with the STL. It does so while providing a complete compiler firewall between the underlying C library and the C++ interface. Care was taken to make all of the features of the original library available to the user of the C++ interface. The library attempts to be forward thinking and uses some features from C++17, and hopefully as the new standard progresses and compiler support improves, so will gumbopp.
Getting Started
Getting started is simple, just clone the git repo like so:
git clone https://github.com/zacharygrafton/gumbopp
cd gumbopp
git submodule init
git submodule update
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr ..
make
make install
The installation will provide CMake configuration files to make using the library with CMake as simple as possible. Use the library in a CMake file:
find_package(gumbopp REQUIRED)
include_directories(${gumbopp_INCLUDE_DIRS})
add_executable(test ${sources})
target_link_libraries(test PRIVATE ${gumbopp_LIBRARIES})
Using the API
Using the api is simple enough, use the Parser::parse
method to parse a
string containing HTML, then start using the Document
object that is
returned to find the nodes that you need. Below is an example:
Document document = Parser::parse(R"html(
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
</head>
<body>
</body>
</html>
)html");
auto begin = document.begin();
auto end = document.end();
auto location = std::find_if(begin, end, [](const auto& node) {
if(node.IsElement())
return node.GetElement() == "html";
return false;
});
if(location != end)
std::cout << "Found <" << location->GetElement() << ">" << std::endl;
Note that in the preceding example, the std::find_if
could be removed and
the same node could have been accessed directly by calling
document.GetRoot()
. The API also supports iterating through the Attribute
s
that are defined on an element like so:
Node node = *location;
for(const auto& attr: node.GetAttributes()) {
std::cout << attr.GetName().to_string() << " = "
<< attr.GetValue().to_string() << std::endl;
}
The source is documented with doxygen, but could probably be improved with more examples, but I believe the API is pretty self discoverable.
The Next Steps
Moving forward with the library, it would be nice to have a way to search through the mini DOM in the spirit of CSS. However, in the meantime, the library should be stable enough for everyday use. As a side note, binary compatability should be easy enough to maintain going forward. Stay tuned for some further announcements.