Floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.

You can perform searches using classes, attributes, tag names and IDs. You can also combine selectors and use groups, like: "a.foo[data-action='bar'], .baz.zaz".

Check the documentation.

Usage

Assuming that you have the following HTML:

<!doctype html>
<html>
<body>
  <section id="content">
    <p class="headline">Floki</p>
    <span class="headline">Enables search using CSS selectors</span>
    <a href="http://github.com/philss/floki">Github page</a>
    <span data-model="user">philss</span>
  </section>
  <a href="https://hex.pm/packages/floki">Hex package</a>
</body>
</html>

Here are some queries that you can perform (with return examples):

Floki.find(html, "#content")
# => [{"section", [{"id", "content"}],
# =>  [{"p", [{"class", "headline"}], ["Floki"]},
# =>   {"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]}]}]


Floki.find(html, "p.headline")
# => [{"p", [{"class", "headline"}], ["Floki"]}]

Floki.find(html, "p.headline")
|> Floki.raw_html
# => <p class="headline">Floki</p>


Floki.find(html, "a")
# => [{"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]},
# =>  {"a", [{"href", "https://hex.pm/packages/floki"}], ["Hex package"]}]


Floki.find(html, "a[href^=https]")
# => [{"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]},
# =>  {"a", [{"href", "https://hex.pm/packages/floki"}], ["Hex package"]}]


Floki.find(html, "#content a")
# => [{"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]}]


Floki.find(html, "[data-model=user]")
# => [{"span", [{"data-model", "user"}], ["philss"]}]


Floki.find(html, ".headline, a")
# => [{"p", [{"class", "headline"}], ["Floki"]},
# =>  {"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]},
# =>  {"a", [{"href", "https://hex.pm/packages/floki"}], ["Hex package"]}]

Each HTML node is represented by a tuple like:

{tag_name, attributes, children_nodes}

Example of node:

{"p", [{"class", "headline"}], ["Floki"]}

So even if the only child node is the element text, it is represented inside a list.

You can write a simple HTML crawler with Floki and HTTPoison:

html
|> Floki.find(".pages a")
|> Floki.attribute("href")
|> Enum.map(fn(url) -> HTTPoison.get!(url) end)

It is simple as that!

Installation

You can install Floki by adding a dependency to your mix file (mix.exs):

defp deps do
  [
    {:floki, "~> 0.6"}
  ]
end

After that, run mix deps.get.

More about the API

To parse a HTML document, try:

html = """
  <html>
  <body>
    <div class="example"></div>
  </body>
  </html>
"""

Floki.parse(html)
# => {"html", [], [{"body", [], [{"div", [{"class", "example"}], []}]}]}

To find elements with the class example, try:

Floki.find(html, ".example")
# => [{"div", [{"class", "example"}], []}]

To convert your node tree back to raw HTML (spaces are ignored):

Floki.find(html, ".example")
|> Flok.raw_html
# =>  <div class="example"></div>

To fetch some attribute from elements, try:

Floki.attribute(html, ".example", "class") # href or src are good possibilities to fetch links
# => ["example"]

You can get attributes from elements that you already have:

Floki.find(html, ".example")
|> Floki.attribute("class")
# => ["example"]

If you want to get the text from an element, try:

Floki.find(html, ".headline")
|> Floki.text

# => "Floki"

License

Floki is under MIT license. Check the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
lib		lib
src		src
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Floki

Usage

Installation

More about the API

License

About

Uh oh!

Releases

Packages

Languages

License

danielspofford/floki

Folders and files

Latest commit

History

Repository files navigation

Floki

Usage

Installation

More about the API

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages