研究許久,用 inner join 兜出想要的 SQL statement:
select然後碰到麻煩了,當初 schema 存 item number 的 data type 是 text,結果發現 field 值若為 '17,458,850.00',SQLite 會把它 cast 為 17 ((int)),這不是我們要的。倘若值為 '17458850.00',那麼 SQLite 會把它正確轉成 17458850。這樣一來,原先 sourcing_base.py 就要略為修改
E.stock_code,
E.activity_date,
E.number as Equity,
A.number as Assets,
E.number / A.number as Ratio
from BalanceSheet as E
inner join
BalanceSheet as A
on E.stock_code = A.stock_code
and E.activity_date = A.activity_date
and E.item = '股東權益總計'
and A.item = '資產總計'
and E.report_type = 'C'
and A.report_type = 'C'
and E.activity_date = '2012-06-30'
cursor.execute(self.SQL_INSERT, (..., row[1].replace(',', '')))
update BalanceSheet set number = replace(number, ',', '');重新做上述 SQL command,好慢,只好建 index:
create index IX_BALANCE_SHEET_ITEM on BalanceSheet(item);好多了。此外,會計對負數表示法有兩種,同樣是負一,可以寫 -1,也可以寫 (1)。
因此當初用 text 存數字就是他媽的錯誤設計。改 schema 吧。
該用 int 或者是 real 呢?因為損益表有類似 something per share 的概念,因此有小數,只好全用 real。接著就是改 schema。Firefox plugin 的 SQLite Manager 有提供快速更改 column data type 的功能 ((!!))
ALTER TABLE "main"."BalanceSheet" RENAME TO "oXHFcGcd04oXHFcGcd04_BalanceSheet";很開心的按下去,結果悲劇了:index 掉了。喵的。只好重建:
CREATE TABLE "main"."BalanceSheet" (
"creation_dt" datetime DEFAULT (current_timestamp) ,
"stock_code" text NOT NULL ,
"report_type" character(1) NOT NULL ,
"report_date" datetime NOT NULL ,
"activity_date" datetime NOT NULL ,
"item" text NOT NULL ,
"number" real,
"revision" int DEFAULT (0) );
INSERT INTO "main"."BalanceSheet" SELECT ... FROM "main"."oXHFcGcd04oXHFcGcd04_BalanceSheet";
DROP TABLE "main"."oXHFcGcd04oXHFcGcd04_BalanceSheet";
create index IX_BALANCE_SHEET_ITEM on BalanceSheet(item);前面 index 很好理解,後面 unique index 就不好理解了。但也沒什麼,當初 create table 有設 unique 限制,SQLite 會建立相對應的 unique index。((其實也不難想像,如果 unique 不建立 index,那每次 insert 資料,都對 table scan 一次,怎麼可能這樣做呢?))
create unique index UX_BALANCE_SHEET_RECORD on
BalanceSheet(stock_code, report_type, report_date, activity_date, item, revision);
((
將來可以先把 index SQL command 先撈出來: SELECT sql FROM sqlite_master where type = 'index' and tbl_name = 'BalanceSheet'))
Stocktotal 總算恢復正常了。
現在我想找近十年自由現金流量為正的公司中,最近一次自有資本比率超過五成的有那些?
select * from
(
select
E.stock_code,
E.activity_date,
E.number as Equity,
A.number as Assets,
E.number / A.number as Ratio
from BalanceSheet as E
inner join
BalanceSheet as A
on E.stock_code = A.stock_code
and E.activity_date = A.activity_date
and E.item = '股東權益總計'
and A.item = '資產總計'
and E.report_type = 'C'
and A.report_type = 'C'
and E.activity_date = '2012-06-30'
and E.stock_code in
(
select stock_code from
(
select stock_code, activity_date, sum(number) as sum from CashFlowStmt
where report_type = 'C'
and strftime('%m-%d', activity_date) = '12-31'
and strftime('%Y', activity_date) > '2001'
and item in ('Operating', 'Financing')
group by activity_date, stock_code
)
where sum > 0
group by stock_code
having count(*) >= 10
)
) where Ratio > 0.5
ET: 3186ms,時間很漂亮 ((index => 用空間換時間))。38 家公司如下:
1301, 1303, 1319, 1434, 1460, 1525, 1730, 1734, 1802, 1905,
2106, 2323, 2325, 2330, 2340, 2367, 2387, 2428, 2460, 2474,
2478, 2492, 3031, 3060, 3311, 5706, 6155, 6202, 6271, 6283,
8103, 8210, 9905, 9917, 9925, 9938, 9939, 9943
然後我發現還欠許多資料,每月營收、股東權利變動表、股價、市場總本益比、三大法人進出等等。
沒有留言:
張貼留言