8000 GitHub - jimmyjose-dev/q: q - Run SQL directly on CSV or TSV files
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

jimmyjose-dev/q

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

q - Text as Data

q is a command line tool that allows direct execution of SQL-like queries on CSVs/TSVs (and any other tabular text files).

q treats ordinary files as database tables, and supports all SQL constructs, such as WHERE, GROUP BY, JOINs etc. It supports automatic column name and column type detection, and provides full support for multiple encodings.

q's web site is http://harelba.github.io/q/. It contains everything you need to download and use q in no time.

Download

Download links for all OSs are here.

Examples

A beginner's tutorial can be found here.

Example 1:

q -H -t "select count(distinct(uuid)) from ./clicks.csv"

Output 1:

229

Example 2:

q -H -t "select request_id,score from ./clicks.csv where score > 0.7 order by score desc limit 5"

Output 2:

2cfab5ceca922a1a2179dc4687a3b26e	1.0
f6de737b5aa2c46a3db3208413a54d64	0.986665809568
766025d25479b95a224bd614141feee5	0.977105183282
5CC5

2c09058a1b82c6dbcf9dc463e73eddd2	0.703255121794

Example 3:

q -t -H "select strftime('%H:%M',date_time) hour_and_minute,count(*) from ./clicks.csv group by hour_and_minute"

Output 3:

07:00	138148
07:01	140026
07:02	121826

Usage Example 4:

q -t -H "select hashed_source_machine,count(*) from ./clicks.csv group by hashed_source_machine"

Output 4:

47d9087db433b9ba.domain.com	400000

Example 5 (total size per user/group in the /tmp subtree):

sudo find /tmp -ls | q "select c5,c6,sum(c7)/1024.0/1024 as total from - group by c5,c6 order by total desc"

Output 5:

mapred hadoop   304.00390625
root   root     8.0431451797485
smith  smith    4.34389972687

Example 6 (top 3 user ids with the largest number of owned processes, sorted in descending order):

Note the usage of the autodetected column name UID in the query.

ps -ef | q -H "select UID,count(*) cnt from - group by UID order by cnt desc limit 3"

Output 6:

root 152
harel 119
avahi 2

Contact

Any feedback/suggestions/complaints regarding this tool would be much appreciated. Contributions are most welcome as well, of course.

Harel Ben-Attia, harelba@gmail.com, @harelba on Twitter

q on twitter: #qtextasdata

About

q - Run SQL directly on CSV or TSV files

Resources

Stars

Watchers

Forks

Packages

No packages published
0