Floki

Floki is a simple HTML parser that enables search using CSS like selectors.

You can search elements by class, tag name and id.

Example

Assuming that you have the following HTML:

<!doctype html>
<html>
<body>
  <section id="content">
    <p class="headline">Floki</p>
    <a href="http://github.com/philss/floki">Github page</a>
  </section>
  <a href="https://hex.pm/packages/floki">Hex package</a>
</body>
</html>

Here are some of the queries that you can perform (with return examples):

Floki.find(html, "#content")
# => {"section", [{"id", "content"}],
# =>  [{"p", [{"class", "headline"}], ["Floki"]},
# =>   {"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]}]}

Floki.find(html, ".headline") # returns a list with the `p` element
# => [{"p", [{"class", "headline"}], ["Floki"]}]

Floki.find(html, "a")
# => [{"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]},
# =>  {"a", [{"href", "https://hex.pm/packages/floki"}], ["Hex package"]}]

Floki.find(html, "#content a")
# => [{"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]}]

Floki.find(html, ".headline, a")
# => [{"p", [{"class", "headline"}], ["Floki"]},
# =>  {"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]},
# =>  {"a", [{"href", "https://hex.pm/packages/floki"}], ["Hex package"]}]

Each HTML node is represented by a tuple like:

{tag_name, attributes, children_nodes}

Example of node:

{"p", [{"class", "headline"}], ["Floki"]}

So even if the only child node is the element text, it is represented inside a list.

You can write a simple HTML crawler (with support of HTTPoison) with a few lines of code:

html
|> Floki.find(".pages a")
|> Floki.attribute("href")
|> Enum.map(fn(url) -> HTTPoison.get!(url) end)

It is simple as that!

API

To parse a HTML document, try:

html = """
  <html>
  <body>
    <div class="example"></div>
  </body>
  </html>
"""

Floki.parse(html)
# => {"html", [], [{"body", [], [{"div", [{"class", "example"}], []}]}]}

To find elements with the class example, try:

Floki.find(html, ".example")
# => [{"div", [{"class", "example"}], []}]

To fetch some attribute from elements, try:

Floki.attribute(html, ".example", "class") # href or src are good possibilities to fetch links
# => ["example"]

You can also get attributes from elements that you already have:

Floki.find(html, ".example")
|> Floki.attribute("class")
# => ["example"]

If you want to get the text from an element, try:

Floki.find(html, ".headline")
|> Floki.text

# => "Floki"

License

Floki is under MIT license. Check the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
config		config
lib		lib
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Floki

Example

API

License

About

Uh oh!

Releases

Packages

Languages

License

liveforeverx/floki

Folders and files

Latest commit

History

Repository files navigation

Floki

Example

API

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages