8000 GitHub - liveforeverx/floki: A HTML parser and searcher written in Elixir
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

liveforeverx/floki

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Floki Build status Floki version

Floki is a simple HTML parser that enables search using CSS like selectors.

You can search elements by class, tag name and id.

Check the documentation.

Example

Assuming that you have the following HTML:

<!doctype html>
<html>
<body>
  <section id="content">
    <p class="headline">Floki</p>
    <a href="http://github.com/philss/floki">Github page</a>
  </section>
  <a href="https://hex.pm/packages/floki">Hex package</a>
</body>
</html>

Here are some of the queries that you can perform (with return examples):

Floki.find(html, "#content")
# => {"section", [{"id", "content"}],
# =>  [{"p", [{"class", "headline"}], ["Floki"]},
# =>   {"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]}]}

Floki.find(html, ".headline") # returns a list with the `p` element
# => [{"p", [{"class", "headline"}], ["Floki"]}]

Floki.find(html, "a")
# => [{"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]},
# =>  {"a", [{"href", "https://hex.pm/packages/floki"}], ["Hex package"]}]

Floki.find(html, "#content a")
# => [{"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]}]

Floki.find(html, ".headline, a")
# => [{"p", [{"class", "headline"}], ["Floki"]},
# =>  {"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]},
# =>  {"a", [{"href", "https://hex.pm/packages/floki"}], ["Hex package"]}]

Each HTML node is represented by a tuple like:

{tag_name, attributes, children_nodes}

Example of node:

{"p", [{"class", "headline"}], ["Floki"]}

So even if the only child node is the element text, it is represented inside a list.

You can write a simple HTML crawler (with support of HTTPoison) with a few lines of code:

html
|> Floki.find(".pages a")
|> Floki.attribute("href")
|> Enum.map(fn(url) -> HTTPoison.get!(url) end)

It is simple as that!

API

To parse a HTML document, try:

html = """
  <html>
  <body>
    <div class="example"></div>
  </body>
  </html>
"""

Floki.parse(html)
# => {"html", [], [{"body", [], [{"div", [{"class", "example"}], []}]}]}

To find elements with the class example, try:

Floki.find(html, ".example")
# => [{"div", [{"class", "example"}], []}]

To fetch some attribute from elements, try:

Floki.attribute(html, ".example", "class") # href or src are good possibilities to fetch links
# => ["example"]

You can also get attributes from elements that you already have:

Floki.find(html, ".example")
|> Floki.attribute("class")
# => ["example"]

If you want to get the text from an element, try:

Floki.find(html, ".headline")
|> Floki.text

# => "Floki"

License

Floki is under MIT license. Check the LICENSE file for more details.

About

A HTML parser and searcher written in Elixir

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Elixir 100.0%
0